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high-level  protocol  programmed  using  explicit  te.sts  for  common  knowledge. 
In  the  second  step,  we  carefully  analyze  when  facts  become  common  knowl¬ 
edge,  thereby  providing  a  method  of  efficiently  implementing  these  protocols 
in  the  crash  failure  model  and  several  variants  of  the  omissions  failure  model. 
In  the  generalized  omissions  model,  however,  our  analysis  shows  that  testing 
for  common  knowledge  is  NP-hard.  Given  the  close  correspondence  between 
common  knowledge  and  simultaneous  actions,  we  are  able  to  show  that  no 
optimal  protocol  for  any  such  problem  can  be  computationally  efficient  in 
this  model.  Our  analysis  exposes  many  subtle  differences  between  the  failure 
models,  including  the  precise  point  at  which  this  gap  in  complexity  occurs. 
This  work  shows  how  knowledge  can  be  effectively  used  in  protocol  design 
and  in  proving  nontrivial  lower  bounds  on  computational  complexity. 

In  areas  like  cryptography,  probability  often  plays  a  role  in  understanding 
interesting  systems  of  agents,  yet  the  standard  definition  of  knowledge  used 
above  ignores  issues  of  probability.  Recent  papers  have  shown  that  more  than 
one  definition  of  probabilistic  knowledge  is  reasonable,  but  they  do  not  tell 
us  how  to  make  the  choice  between  these  definitions.  We  clarify  the  issues 
involved  in  making  the  right  choice.  We  show  that  no  single  definition  is  ap¬ 
propriate  in  all  contexts.  Given  a  particular  context,  however,  we  show  how 
to  construct  the  most  appropriate  definition  for  that  context,  where  “most 
appropriate”  is  made  precise  in  terms  of  betting  games  against  an  adversary. 
V/e  show  how  probabilistic  knowledge  can  be  used  to  specify  coordinated 
attack,  and  how  different  definitions  of  probabilistic  knowledge  result  in  dif¬ 
ferent  levels  of  guarantees  by  the  problem  statement.  Another  important 
aspect  of  cryptography  is  the  fact  that  an  agent’s  knowledge  (of  the  contents 
of  a  message,  for  example)  is  limited  by  the  bounds  on  its  computational 
power,  yet  the  standard  definition  of  knowledge  ignores  computational  com¬ 
plexity,  in  addition  to  probability.  We  show  how  such  issues  in  cryptography 
motivate  the  deffnition  of  practical  knowledge,  and  then  turn  to  the  problem 
of  using  probabilistic  and  practical  knowledge  to  reason  about  cryptography. 

While  the  intuition  underlying  a  zero  knowledge  proof  system  [GMR89} 
is  that  ao  “knowledge”  is  leaked  by  the  prover  to  the  verifier,  researchers 
are  just  beginning  to  analyze  such  cryptographic  systems  in  terms  of  formal 
notions  of  knowledge.  We  show  how  the  definition  of  an  interactive  proof 
system  can  be  characterized  directly  in  terms  of  practical  knowledge.  Using 


this  notion  of  knowledge,  wc  formally  capture  and  prove  the  intuition  that 


the  prover  does  not  leak  any  knowledge  of  any  fact  (other  than  the  fact  being 


proven)  during  a  zero  knowledge  proof  We  extend  this  result  to  show  that  the 
prover  does  not  leak  any  knowledge  of  how  to  compute  any  information  (such 
as  the  factorization  of  a  number)  during  a  zero  knowledge  proof.  Finally, 
we  show  how  our  knowledge-theoretic  characterization  of  interactive  proof 
systems  can  be  used  to  prove  simple  properties  of  such  systems.  This  work 
represents  a  first  step  toward  the  ultimate  goal  of  being  able  to  reason  about 
cryptographic  systems  directly  in  terms  of  knowledge,  reasoning  at  a  higher 
semantic  level  than  the  operational  cryptographic  definitions  themselves. 
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Understanding  systems  of  agents  that  interact  in  some  way  is  fundamental 
to  many  areas  of  science,  including  philosophy,  linguistics,  economics,  game 
theory,  logic,  artificial  intelligence,  robotics,  and  distributed  computing.  As 
we  try  to  understand  these  systems,  we  often  find  ourselves  reasoning  (at 
least  informally)  about  the  knowledge  these  agents  have  about  other  agents. 
Recent  work  has  shown  that  these  informal  notions  of  knowledge  can  be 
made  precise  in  the  context  of  computer  science.  In  this  thesis,  we  pro¬ 
vide  convincing  evidence  that  reasoning  in  terms  of  knowledge  can  lead  to 
general,  unifying  results  about  distributed  computation,  and  we  extend  the 
standard  definitions  of  knowledge  and  apply  them  in  new  contexts  such  as 
cryptography. 

Many  problems  in  the  literature  such  as  the  consensus  and  distributed 
firing  squad  problems  require  processors  in  a  synchronous  system  to  per¬ 
form  some  action  simultaneously,  yet  each  problem  is  solved  in  each  model 


of  processor  failure  using  a  different  algorithm.  We  give  a  single  algorithm 
scheme  with  which  we  can  transform  specifications  of  such  problems  directly 
into  protocols  that  are  optimal  in  a  very  strong  sense:  these  protocols  are 
optimal  in  all  runs,  which  means  that  given  any  possible  input  to  the  system 
and  any  possible  faulty  processor  behavior,  these  protocols  are  guaranteed  to 
perform  the  simultaneous  action  as  soon  as  any  other  protocol  would  do  so 


in  the  same  context.  In  contrast,  most  ollief  protocols  in  the  literature  are 


optimal  only  in  the  worst  case  run.  This  transformation  is  performed  in  two 


steps.  In  the  first  step,  we  extract  directly  from  the  problem  specification  a 


high-level  protocol  programmed  using  explicit  tests  for  common  knowledge. 
In  the  second  step,  we  carefully  analyze  when  facts  become  common  knowl¬ 
edge,  thereby  providing  a  method  of  efficiently  implementing  these  protocols 
in  the  crash  failure  model  and  several  variants  of  the  omissions  failure  model. 
In  the  generalized  omissions  model,  however,  our  analysis  shows  that  testing 
for  common  knowledge  is  NP-hard.  Given  the  close  correspondence  between 
common  knowledge  and  simultaneous  actions,  we  are  able  to  show  that  no 
optimal  protocol  for  any  such  problem  can  be  computationally  efficient  in 
this  model.  Our  analysis  exposes  many  subtle  differences  between  the  failure 
models,  including  the  precise  point  at  which  this  gap  in  complexity  occurs. 
This  work  shows  how  knowledge  can  be  effectively  used  in  protocol  design 
and  in  proving  nontrivial  lower  bounds  on  computational  complexity. 

In  areas  like  cryptography,  probability  often  plays  a  role  in  understanding 
interesting  systems  of  agents,  yet  the  standard  definition  of  knowledge  used 
above  ignores  issues  of  probability.  Recent  papers  have  shown  that  more  than 
one  definition  of  probabilistic  kriovAedge  is  reasonable,  but  they  do  not  tell 
us  how  to  make  the  choice  between  these  definitions.  We  clarify  the  issues 
involved  in  making  the  right  choice.  We  show  that  no  single  definition  is  ap¬ 
propriate  in  all  contexts.  Given  a  particular  context,  however,  we  show  how 
to  construct  the  most  appropriate  definition  for  that  context,  where  “most 
appropriate”  is  made  precise  in  terms  of  betting  games  against  an  adversary. 
We  show  how  probabilistic  knowledge  can  be  used  to  specify  coordinated 
attack,  and  how  different  definitions  of  probabilistic  knowledge  result  in  dif¬ 
ferent  levels  of  guarantees  by  the  problem  statement.  Another  important 
aspect  of  cryptography  is  the  fact  that  an  agent’s  knowledge  (of  the  contents 
of  a  message,  for  example)  is  limited  by  the  bounds  on  its  computational 
power,  yet  the  standard  definition  of  knowledge  ignores  computational  com¬ 
plexity,  in  addition  to  probability.  We  show  how  such  issues  in  cryptography 
motivate  the  definition  of  practical  knowledge,  and  then  turn  to  the  problem 
of  using  probabilistic  and  practical  knowledge  to  reason  about  cryptography. 

While  the  intuition  underlying  a  zero  knowledge  proof  system  [GMR89] 
is  that  no  “knowledge”  is  leaked  by  the  prover  to  the  verifier,  researchers 
are  just  beginning  to  analyze  such  cryptographic  systems  in  terms  of  formal 
notions  of  knowledge.  We  show  how  the  definition  of  an  interactive  proof 
system  can  be  characterized  directly  in  terms  of  practical  knowledge.  Using 
this  notion  of  knowledge,  we  formally  capture  and  prove  the  intuition  that 
the  prover  does  not  leak  any  knowledge  of  any  fact  (other  than  the  fact  being 
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proven)  during  a  zero  knowledge  proof.  We  extend  this  result  to  show  that  the 
prover  does  not  leak  any  knowledge  of  how  tc  compute  any  information  (such 
as  the  factorization  of  a  number)  during  a  zero  knowledge  proof.  Finally, 
we  show  how  our  knowledge-theoretic  characterization  of  interactive  proof 
systems  can  be  used  to  prove  simple  properties  of  such  systems.  This  work 
represents  a  first  step  toward  the  ultimate  goal  of  being  able  to  reason  about 
cryptographic  systems  directly  in  terms  of  knowledge,  reasoning  at  a  higher 
semantic  level  than  the  operational  cryptographic  definitions  themselves. 
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Chapter  1 
Introduction 


Today,  with  the  exception  of  home  personal  computers,  nearly  every  com¬ 
puter  is  part  of  a  larger  network  of  computers.  A  distributed  system  is  a 
collection  of  computers  (or  processors)  that  can  exchange  information  by 
sending  message  to  one  another  over  some  communication  network.  The 
motivation  behind  btiilding  a  distributed  system  may  be  as  simple  as  the 
desire  to  allow  people  working  at  the  computers  to  send  messages  to  each 
other,  or  to  share  the  use  of  a  common  printer.  A  more  sophisticated  reason 
for  doing  so  is  to  allow  the  computers  to  work  together  to  solve  a  problem. 


Unfortunately,  writing  the  program  to  solve  this  problem  is  often  quite 
difHcxilt.  This  is  usually  because  the  problem  is  defined  in  terms  of  the  global 
system  state,  whereas  an  individual  processor  must  base  its  actions  solely  on 
the  information  recorded  in  its  local  state,  typically  a  small  haction  of  the 
information  represented  by  the  global  state.  As  a  result,  a  processor  must 
base  its  actions  on  incomplete  knowledge  of  the  global  state.  The  limita¬ 
tions  of  what  a  processor  czm  know  about  the  global  state  is  a  fundamental 
source  of  difficulty  when  programming  distributed  systems.  It  often  feels 
quite  natured,  therefore,  to  reason  informally  about  distributed  computation 
in  terms  of  what  each  processor  knows.  The  primary  purpose  of  this  work 
is  to  explore  the  role  of  knowledge  in  the  design  and  analysis  of  distributed 
algorithms.  We  provide  some  convincing  evidence  that  reasoning  in  terms  of 
knowledge  can  yield  general,  unifying  results  about  distributed  computation, 
and  we  extend  the  standard  definitions  of  knowledge  in  order  to  apply  them 
in  new  contexts. 
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1.1  Motivation 

One  of  ihe  most  well-known  examples  of  informal  reasoning  about  knowledge 
when  thinking  about  distributed  computing  involves  the  coordinated  attack 
problem,  a  formulation  by  Gray  [Gra78]  of  a  folk  theorem  concerning  the 
impossibility  of  coordination  in  asynchronous  systems.  This  problem  is  de¬ 
fined  as  follows.  Two  generals  A  and  B  are  on  opposite  hills  with  a  common 
enemy  encamped  in  the  valley  between  them.  Neither  general  has  any  initial 
intention  of  attacking,  but  might  at  some  later  point  decide  to  attack  the  en¬ 
emy.  The  two  generals  must  attack  the  enemy  simultaneously,  however,  since 
a  general  attacking  by  himself  is  certain  to  be  destroyed.  Unfortunately,  the 
only  way  the  two  generals  C2ua  communicate  is  via  messengers  who  may  be 
captured  enroute  by  the  enemy.  The  coordinated  attack  problem  is  the  fol¬ 
lowing:  is  there  a  protocol  the  two  generals  can  follow  that  guarantees  both 
generals  attack  the  enemy  simultaneously  whenever  a  single  general  attacks? 

Gray  shows  that  the  only  such  protocol  is  one  in  which  neither  general 
attacks.  To  see  this,  suppose  P  is  a  protocol  for  coordinated  attack,  and 
suppose  there  is  an  execution  of  P  in  which  the  two  generals  attack  simulta¬ 
neously  after  exchanging  a  total  of  k  messages  (that  is,  after  dispatching  k 
messengers  who  may  or  may  not  have  successfully  delivered  their  messages). 
Consider  the  last  message  m  received  by  either  of  the  generals  before  the  at¬ 
tack.  Suppose  m  was  sent  by  general  A,  and  consider  the  instant  the  attack 
begins.  At  this  point,  A  doesn’t  know  whether  B  has  received  m  or  not,  but 
A  hsis  committed  himself  to  the  attack  in  either  case.  If  we  consider  the  exe¬ 
cution  differing  from  the  current  execution  only  in  that  B  does  not  receive  m, 
therefore,  we  see  that  A  zdso  attacks.  Since  P  guarantees  that  both  generals 
attack  whenever  a  single  general  attacks,  this  must  be  an  execution  of  this 
protocol  in  which  the  two  generals  attack  simtdtaneously  after  exchanging 
only  k  —  1  messages.  Continuing  by  induction,  we  see  that  if  there  is  any 
execution  of  P  in  which  the  two  generals  attack,  then  there  is  an  execution 
in  which  the  two  generals  attack  without  sending  any  messages.  But  if  no 
messages  are  sent,  then  B  cannot  possibly  know  of  A’s  intention  of  attacking, 
and  a  simultaneous  attack  is  impossible.  It  follows  that  the  only  protocols 
for  coordinated  attack  eire  protocols  in  which  neither  general  attacks! 

This  appeal  to  our  intuition  that  A  does  not  “know”  whether  B  has  re¬ 
ceived  m  seems  quite  natural.  Roughly  speaking,  from  A’s  point  of  view, 
there  are  two  global  states  consistent  with  the  information  recorded  in  A’s 
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local  state:  either  B  has  received  m,  or  the  messenger  carrying  m  was  cap¬ 
tured  by  the  enemy  and  B  has  not  received  m.  It  follows  that  A  cannot 
knovj  that  m  has  been  received,  since  it  is  possible  that  B  has  not  received 
m.  Philosophers  have  formalized  this  intuition  concerning  knowledge  as  early 
as  1962  with  Hintikka’s  possible  world  semantics  for  knowledge  [Hin62].  The 
basic  idea  is  that,  in  any  world  or  state  of  affairs,  a  processor  considers  a 
number  of  worlds  to  be  possible  in  addition  to  the  actual  world,  and  that  a 
processor  knows  a  fact  if  that  fact  is  true  in  all  worlds  the  processor  considers 
possible.  In  the  case  of  coordinated  attack,  for  example,  A  considers  at  least 
two  worlds  possible,  one  in  which  m  was  received  and  one  in  which  m  was 
not,  and  hence  cannot  be  said  to  know  m  has  been  received  since  one  of  the 
worlds  it  considers  possible  is  a  world  in  which  m  has  not  been  received. 

An  interesting  difference  between  the  use  of  knowledge  by  philosophers 
and  by  computer  scientists,  however,  is  that  computer  scientists  tend  to  be 
interested  in  the  knowledge  of  groups  of  processors  as  well  as  the  knowledge  of 
individual  processors.  For  example,  we  can  say  that  everyone  knows  a  fact 
if  every  processor  knows  the  fact  according  to  the  definition  of  knowledge 
given  above.  Another  interesting  state  of  knowledge  turns  out  to  be  the 
state  of  common  knowledge.  Roughly  speaking,  a  fact  is  common  knowledge 
if  everyone  knows  the  fact,  everyone  knows  that  everyone  knows  the  fact, 
and  so  on.  Such  definitions  of  knowledge  were  first  made  in  the  context  of 
distributed  computing  by  Halpern  and  Moses  [HM84]  (and  later  by  others 
[CM86,  FI86,  PR85]).  In  fact,  in  that  paper  they  give  a  formal  proof  of  the 
impossibility  of  coordinated  attack  directly  in  terms  of  knowledge.  They  show 
that  attaining  common  knowledge  of  a  certain  fact  is  a  necessary  condition  for 
the  generals  to  attack.  They  go  on  to  prove,  using  an  argument  very  similM 
to  the  combinatorial  argument  sketched  above,  that  it  is  impossible  to  attain 
common  knowledge  of  any  nontrivial  fact  in  (asynchronous)  systems  where 
messages  (or  messengers)  can  be  lost  or  indefinitely  delayed.  Combining  these 
results,  it  follows  that  coordinated  attack  is  impossible  in  such  systems. 

This  argument  is  a  rigorous  proof  that  captures  much  of  the  informal 
intuition  concerning  knowledge  in  the  proof  sketched  above.  In  distributed 
computing,  when  an  algorithm  or  an  impossibility  proof  is  sketched,  it  is 
often  the  appeal  to  our  intuition  concerning  knowledge  that  makes  the  pre- 
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natorial  arguments.  Halpern  and  Moses  made  a  fundamental  contribution  in 
showing  that  it  is  possible  to  make  rigorous  the  intuition  concerning  knowl¬ 
edge  we  use  informally  when  reasoning  about  distribute  algorithms.  As  a 
result,  they  made  significant  progress  toward  the  goal  of  making  explicit  rea¬ 
soning  about  knowledge  a  fundamental  tool  for  reasoning  about  distributed 
computation.  Part  of  the  motivation  for  this  work  is  to  make  further  progress 
toward  this  goal. 


1.2  Related  Work 

By  far  the  most  common  use  of  knowledge  in  distributed  computation  has 
been  to  prove  lower  bounds  and  impossibility  results.  A  fundamental  tech¬ 
nique  for  proving  lower  bounds  on  message  complexity  is  given  by  Chandy 
and  Misra  [CM86],  where  they  analyze  the  communication  complexity  re¬ 
quired  for  a  processor  to  reach  a  given  state  of  knowledge  in  an  asynchronous 
system.  Roughly  speaking,  they  show  that  if  at  time  t  processor  ti  does  not 
know  a  fact  <p,  and  at  a  later  time  t'  processor  im  knows  processor  t'm-i 
knows  . . .  processor  ii  does  know  then  some  sequence  (or  chain)  of  mes¬ 
sages  from  ti  to  t2  ...  to  im  must  have  occurred  between  times  t  and  t'. 
Using  this  result,  they  show  how  to  prove  lower  bounds  on  communication 
complexity  for  various  problems  such  as  mutual  exclusion  and  termination 
detection.  These  proofs  proceed  by  showing  that  a  certain  number  of  levels 
of  “processor  i  knows  processor  j  knows”  are  required  to  solve  the  problem, 
and  then  appealing  to  their  main  theorem  to  prove  that  any  protocol  solving 
the  problem  must  result  in  a  chain  of  messages  of  a  certain  length. 

Along  the  same  lines,  Moses  and  Roth  have  recently  performed  a  slightly 
more  sophisticated  analysis  in  (MR89]  where  they  study  the  problem  of  mes¬ 
sage  diffusion  in  asynchronous  systems  [SFC85],  the  problem  of  diffusing  a 
given  message  throughout  a  system  in  such  a  way  that  each  processor  “con¬ 
sumes”  the  message  exactly  once.  They  show  that  two  levels  of  knowledge 
are  sufficient  if  communication  in  the  system  is  not  required  to  subside,  and 
that  any  subsiding  protocol  must  either  attain  three  levels  of  knowledge  or 
use  three  different  types  of  messages.  Lower  bounds  on  message  complexity 
of  such  protocols  follow  immediately. 

Similarly,  in  [Had87],  Hadzilacos  studies  two-  and  three-phase  atomic 
commit  protocols  (used  in  the  context  of  transaction  processing  in  distributed 
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databases)  in  terms  of  knowledge,  and  characterizes  the  levels  of  knowledge 
required  for  a  site  to  commit  a  transaction  when  following  such  protocols.  As 
corollaries  of  these  characterizations,  he  is  able  to  show  that  no  nonblocking 
atomic  commit  protocol  can  tolerate  communication  failures,  and  he  is  able 
to  derive  a  known  lower  bound  (due  to  Dwork  and  Skeen  [DS83])  on  the 
number  of  messages  required  to  commit  a  transaction.  In  the  same  vein, 
Mazer  [Maz88,  Maz89]  performs  a  knowledge-theoretic  analysis  of  commit 
protocols  that  guarantee  that  all  participants  reach  a  consistent  decision  on 
the  commitment  of  a  transaction  in  systems  where  failed  sites  can  recover 
and  rejoin  the  system. 

As  with  coordinated  attack,  a  numbt '  of  impossibility  results  for  compu¬ 
tation  in  asynchronous  systems  follow  fron  ^he  fact  that  common  knowledge 
cannot  be  attsdned  in  such  systems.  Bu>  some  problems  can  be  solved  in 
asynchronous  systems.  This  implies  that  the  state  of  common  knowledge  is 
not  relevant  in  the  context  of  these  problems.  In  order  to  analyze  these  prob¬ 
lems,  therefore,  a  number  of  other  definitions  of  knowledge  such  as  eventual 
common  knowledge  and  time-stamped  common  knowledge  have  been  pro¬ 
posed  (see  [HM84]).  In  (PT88],  Panangaden  and  Taylor  define  the  notion 
of  concurrent  common  knowledge  and  show  how  several  problems  such  as 
finding  global  snapshots  [CL85]  of  the  global  system  state  can  be  analyzed 
in  terms  of  concurrent  common  knowledge. 

Just  as  important  as  lower  bounds  and  impossibility  results,  however,  is 
the  use  of  knowledge  in  the  actual  design  of  protocols.  The  motivation  for 
the  use  of  knowledge  in  protocol  design  is  that  a  processor’s  actions  must 
depend  on  what  it  knows.  When  a  protocol  tests  for  the  equality  of  two 
variables,  the  protocol  is  implicitly  testing  for  a  certain  state  of  knowledge. 
In  [HF88],  Halpern  and  Fagin  generalize  the  standard  notion  of  a  proto¬ 
col  by  defining  knowledge-based  protocols,  protocols  in  which  a  processor’s 
actions  may  explicitly  depend  on  tests  for  knowledge.  Such  protocols  typ¬ 
ically  include  explicit  tests  for  knowledge,  and  include  statements  such  as 
“if  processor  1  knows  processor  2  has  received  message  m,  then  perform 
action  o.”  Translating  a  knowledge-based  protocol  into  a  standard  pro¬ 
tocol,  therefore,  requirv<;s  implementing  the  embedded  tests  for  conditions 
such  as  “processor  1  knows  processor  2  has  received  m.”  The  advantage 


high-level  description  and  explanation  of  a  processor’s  behavior.  For  ex¬ 


ample,  Halpern  and  Zuck  construct  in  [HZ87]  a  family  of  knowledge-based 
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protocols  solving  the  sequence  transmission  problem  (the  problem  of  trans¬ 
mitting  a  sequence  of  bits  over  an  unreliable  channel),  emd  show  that  known 
solutions  [AUY79,  AUWY82,  BSW69,  Ste76]  to  the  sequence  transmission 
problem,  including  the  alternating  bit  protocol,  crm  be  viewed  as  particular 
instances  of  these  knowledge-based  protocols. 

Another  example  of  the  useful  level  of  abstraction  knowledge-based  pro¬ 
tocols  provide  is  the  work  of  Neiger  and  Toueg  in  [NT87].  They  construct  a 
broadcast  primitive  that  can  be  used  to  cause  certain  facts  to  become  “com¬ 
mon  knowledge”  in  systems  with  asynchronous  communication,  systems  in 
which  true  common  knowledge  cannct  be  attained.  Consequently,  using  this 
tool  (and  other  tools  developed  in  the  paper),  programmers  are  able  to  make 
simplifying  assumptions  when  they  design  protocols  by  assuming  common 
knowledge  of  certain  facts  is  attainable,  and  are  able  to  implement  these 
protocols  using  these  broadcast  primitives. 

The  first  significant  use  of  knowledge  in  the  design  of  new  protocols,  how¬ 
ever,  is  the  work  of  Dwork  and  Moses  in  [DM86].  They  study  the  problem 
of  simultaneous  Byzantine  agreement  [PSL80,  PisSS]  in  which  each  processor 
starts  with  an  initial  input  bit,  and  all  processors  are  reqtured  to  come  to 
agreement  on  a  final  output  bit  simultaneously  at  some  later  time.  They 
analyze  this  problem  in  synchronous  systems  with  the  crash  failure  model, 
a  simple  failure  model  in  which  a  processor  may  crash  in  the  middle  of  an 
execution  and  never  again  participate  in  that  execution.  They  show  that  in 
such  systems  common  knowledge  of  a  certain  fact  is  a  necessary  and  sufE- 
cient  condition  for  processors  to  reach  agreement.  Using  this  observation, 
they  construct  a  knowledge-based  protocol  that  is  optimal  in  a  very  strong 
sense:  this  protocol  is  optimal  in  all  runs,  which  means  that  given  any  pos¬ 
sible  input  to  the  system  and  any  possible  faulty  processor  behavior,  this 
protocol  is  guaranteed  to  reach  consensus  soon  as  any  other  protocol  would 
do  so  in  the  same  context.  In  contrast,  most  protocols  in  the  literature 
perform  in  every  run  only  as  well  as  they  do  in  their  worst  case  nm.  The 
protocol  constructed  in  [DM86]  for  agreement,  for  example,  can  halt  in  as 
few  as  two  rounds  of  commurucation,  much  sooner  that  most  known  proto¬ 
cols.  They  then  construct  polynomial-time  implementations  of  the  tests  for 
common  knowledge  embedded  in  their  knowledge-based  protocol,  resulting 
in  a  standard  (optimal)  protocol  for  agreement. 
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1.3  Thesis  Contributions 

The  results  of  Dwork  and  Moses  are  the  springboard  for  the  first  half  of  this 
work.  In  Chapter  3,  we  generalize  their  work  in  several  dimensions. 

While  Dwork  and  Moses  show  how  to  construct  optimal  protocols  for 
agreement,  implicit  in  their  work  is  a  technique  for  constructing  optimal 
protocols  for  many  other  problems  such  as  the  distributed  firing  squad  prob¬ 
lem,  problems  in  which  processors  are  required  to  choose  and  perform  the 
same  action  simultaneously.  In  order  to  make  this  precise,  we  define  the  gen- 
ersd  class  of  simultaneous  choice  problems.  Problems  in  this  class,  including 
the  agreement  and  distributed  firing  squad  problems,  require  processors  to 
choose  and  perform  a  simultaneous  action,  an  action  (such  as  deciding  on 
the  value  of  an  output  bit)  that  must  be  performed  simultaneously  by  all 
processors  whenever  it  is  performed  by  any  processor.  In  the  literature,  each 
combination  of  a  simultaneous  choice  problem  and  a  failure  model  results  in 
a  different  algorithm.  In  contrast,  we  give  a  single  tdgorithm  scheme  with 
which  we  can  transform  specifications  of  such  problems  directly  into  protocols 
that  are  optimal  in  all  runs,  in  the  sense  of  Dwork  and  Moses,  in  a  number 
of  failure  models.  This  trsinsformation  is  performed  in  two  steps.  In  the  first 
step,  we  extract  directly  from  the  problem  specification  a  high-level  proto¬ 
col  programmed  using  explicit  tests  for  common  knowledge.  In  the  second 
step,  we  carefully  analyze  when  facts  become  common  knowledge,  resulting 
in  efiicient  implementations  of  the  tests  for  common  knowledge  embedded  in 
this  high-level  protocol,  and  consequently  providing  a  method  for  efiiciently 
implementing  these  protocols. 

The  high-level,  knowledge-based  protocols  we  construct  are  similar  to  the 
protocol  given  by  Dwork  and  Moses.  The  technical  analysis  we  perform  in 
order  to  implement  the  embedded  tests  for  common  knowledge,  however, 
is  quite  different.  The  analysis  of  Dwork  and  Moses  makes  strong  use  of 
particular  properties  of  the  crash  failure  model  and  does  not  extend  to  more 
complicated  failure  models.  In  contrast,  our  analysis  applies  to  both  the  crash 
failure  model  and  several  variants  of  the  omissions  failure  model,  a  model  in 
which  faulty  processors  may  intermittently  fail  to  send  messages,  instead  of 
crashing  at  some  point  and  falling  silent  from  then  on.  Interestingly,  our 
techniques  for  implementing  tests  for  common  knowledge  are  purely  combi¬ 
natorial.  As  a  result,  our  work  is  a  nice  example  of  how  knowledge-theoretic 
and  combinatorial  reasoning  can  be  used  together  in  protocol  design:  think- 
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ing  in  terms  of  knowledge  allows  us  to  isolate  the  heart  of  a  problem,  which 
can  in  turn  be  solved  using  combinatorial  methods. 

Given  that  similar  knowledge-based  protocols  yield  optimal  protocols  for 
agreement  in  both  the  crash  and  omissions  failure  models,  one  might  hope 
that  the  same  protocol  wotild  work  in  even  more  malicious  models  like  the 
Byzantine  model  where  faulty  processors  are  allowed  to  behave  in  an  arbi¬ 
trary  fashion.  We  are  able  to  show,  however,  that  this  is  quite  unlikely.  We 
consider  a  variant  of  the  omissions  model  called  the  generalized  omissions 
model  in  which  faulty  processors  may  intermittently  fail  both  to  send  and  to 
receive  messages.  In  this  model,  we  show  that  the  same  knowledge-based  pro¬ 
tocol  is  an  optimal  protocol  for  performing  simultaneous  actions,  but  that 
implementing  tests  for  common  knowledge  in  this  model  is  suddenly  NP- 
hard!  In  fact,  using  the  close  correspondence  between  common  knowledge 
and  the  performance  of  simultaneous  actions,  we  are  able  to  show  that  any 
protocol  for  performing  simultaneous  actions  in  this  model  that  is  optimal 
in  all  runs  must  require  processors  to  perform  NP-hard  computations.  This 
means,  for  example,  that  there  can  be  no  optimal,  polynomial-time  protocol 
for  agreement,  assuming  P^NP.  Our  analysis  exposes  many  subtle  differ¬ 
ences  between  the  failure  models  we  consider,  including  the  precise  point  at 
which  this  gap  in  complexity  occurs.  This  work  shows  how  knowledge  can  be 
effectively  used  in  protocol  design,  as  does  the  work  of  Dwork  and  Moses,  but 
it  also  shows  how  knowledge  can  be  used  to  prove  nontrivial  lower  bounds 
on  computational  complexity. 

One  consequence  of  this  work  is  that  it  shows  for  the  first  time  that 
definitions  of  knowledge  must  take  computational  complexity  into  account 
even  when  analyzing  simple  problems  in  relatively  simple  failure  models, 
and  even  when  issues  of  computational  complexity  have  not  been  introduced 
Mtificiedly  via  cryptographic  assumptions.  In  general,  however,  there  axe 
many  situations  in  which  the  standard  definition  of  knowledge  does  not  seem 
appropriate.  One  of  the  important  contributions  of  this  thesis  is  to  improve 
our  understanding  of  how  to  define  notions  of  knowledge  for  use  in  these 
contexts.  This  is  the  topic  of  the  second  half  of  this  thesis. 

One  context  in  which  the  standard  definition  of  knowledge  does  not  seem 
particularly  appropriate  is  the  context  of  probabilistic  protocols.  Such  pro¬ 
tocols  are  quite  important  in  computer  science  since  there  are  a  number  of 
problems  (such  as  testing  for  primality  [RabSO])  that  we  can  solve  probabilis¬ 
tically  but  not  deterministically,  and  we  would  like  to  be  able  to  reason  about 
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these  protocols  in  terms  of  knowledge,  too.  Probabilistic  protocols,  however, 
typically  guarantee  that  certain  conditions  hold  only  with  high  probability, 
and  not  with  certainty.  Consequently,  while  a  processor  may  not  know  a 
given  fact  is  true,  it  may  be  quite  confident  the  fact  is  true.  In  [FH88],  Fagin 
and  Halpern  give  a  general  framework  in  which  it  is  possible  to  define  an 
entire  family  of  definitions  of  knowledge,  called  prohabilistic  knowledge,  that 
incorporate  knowledge  and  probability.  Their  idea  essentially  depends  on 
being  able  to  assign  probability  spaces  to  the  various  processors  to  use  when 
computing  their  “confidence”  that  a  given  fact  is  true.  They  do  not  tell  us, 
however,  which  assignment  to  use. 

In  Chapter  4,  we  show  how  to  construct  the  “best”  assignment  of  prob¬ 
ability  spaces,  and  hence  the  “best”  definition  of  probabilistic  knowledge. 
Surprisingly,  however,  one  of  our  main  observations  is  that  there  is  no  single 
definition  of  probabilistic  knowledge  that  is  most  appropriate  in  all  contexts. 
More  precisely,  we  show  that  the  various  definitions  of  probabilistic  knowl¬ 
edge  can  best  be  understood  in  terms  of  betting  games  and  betting  against 
different  adversaries.  We  show  how  different  adversaries  lead  to  different 
definitions  of  probabilistic  knowledge,  and  given  a  particular  adversary,  we 
show  how  to  construct  the  “best”  definition  of  probabilistic  knowledge  for 
this  particular  adversary  (where  “best”  is  made  precice  in  terms  of  betting 
games).  In  addition,  we  show  how  definitions  of  probabilistic  knowledge  can 
be  used  to  analyze  probabilistic  protocols:  we  give  a  specification  of  a  proba¬ 
bilistic  version  of  coordinated  attack  in  terms  of  probabilistic  knowledge,  and 
then  show  how  different  definitions  of  probabilistic  knowledge  (correspond¬ 
ing  to  increasingly  powerful  adversaries)  result  in  problem  specifications  with 
increasingly  powerful  correctness  conditions. 

Another  context  in  which  the  standard  definition  of  knowledge  does  not 
seem  particularly  appropriate  is  when  it  is  important  to  recognize  the  bounds 
on  processors’  computational  resources.  The  standard  definition  of  knowl¬ 
edge  essentially  says  that  a  processor  knows  any  fact  that  follows  from  the 
information  in  its  local  state,  regardless  of  the  complexity  of  computing  that 
fact.  In  the  context  of  cryptography,  for  example,  the  assumption  that  a 
polynomial-time  processor  cannot  factor  a  random  integer  and  he.uce  can¬ 
not  know  its  factorization  is  often  crucial  to  the  security  of  cryptographic 
protocols.  In  fact,  cryptographic  protocols  are  interesting  because  they  typi¬ 
cally  combine  both  the  use  of  probability  and  the  use  of  complexity-theoretic 
assumptions,  meaning  that  a  definition  of  knowledge  useful  in  the  context 
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of  cryptography  will  have  to  incorporate  both  probability  and  bounds  on 
processors’  computational  resources. 

Two  types  of  cryptographic  protocols  that  have  received  au  enormous 
amount  of  attention  recently  are  interactive  and  zero  knowledge  proof  sys¬ 
tems  [GMR89].  The  intuition  underlying  a  zero  knowledge  proof  system 
is  that  a  “prover”  would  like  to  convince  a  “verifier”  that  a  certain  fact  is 
true  without  leaking  any  “knowledge”  of  amy  other  fact  to  the  verifier  in 
the  process.  Interestingly,  while  this  intuition  is  closely  related  to  notions  of 
knowledge,  the  cryptographic  definitions  of  such  proof  systems  do  not  make 
any  explicit  reference  to  knowledge. 

In  Chapter  5,  we  explore  definitions  of  knowledge  that  incorporate  both 
probability  and  bounds  on  processors’  computationad  powers.  In  particular, 
we  show  how  interactive  proof  systems  motivate  a  new  notion  of  practical 
knowledge.  We  then  characterize  the  definition  of  an  interactive  proof  system 
directly  in  terms  of  practical  knowledge.  Using  this  definition  of  knowledge, 
we  capture  the  intuition  that  the  verifier  learns  essentiadly  nothing  ais  a  result 
of  a  zero  knowledge  proof,  other  than  the  fact  the  prover  initially  sets  out  to 
prove.  Finally,  using  these  characterizations,  we  sketch  an  example  of  how  to 
prove  simple  properties  of  such  proof  systems  directly  in  terms  of  knowledge. 
This  work  represents  a  first  step  toward  the  ultimate  goal  of  being  able  to 
reason  about  cryptographic  systems  directly  in  terms  of  knowledge,  reason¬ 
ing  at  a  higher  semantic  level  than  the  operational  cryptographic  definitions 
themselves.  In  addition,  this  work  sheds  some  light  on  issues  concerning  def¬ 
initions  of  knowledge  (like  practical  knowledge)  that  account  for  processors’ 
limited  computational  resources. 


Chapter  2 

Knowledge  and 
Common  Knowledge 

In  this  work,  we  will  study  systems  of  agents  that  interact  in  some  way, 
typically  to  solve  a  problem.  While  the  precise  meaning  of  an  agent  will 
depend  on  the  system  under  consideration  (an  agent  may  be  a  processor  in  a 
distributed  system  or  a  consumer  in  an  economic  model),  the  meaning  should 
always  be  clear  from  context.  The  purpose  of  this  chapter  is  to  review  the 
standard  definitions  of  what  it  means  for  such  an  agent  to  “know”  something. 

2.1  Systems  of  Agents 

We  begin  with  a  formal  model  of  a  system  of  agents.  Our  model  is  essentially 
that  of  [HF88],  a  simplification  of  [HM84]. 

Consider  a  system  of  n  interacting  agents  Pi, . . .  ,Pn  (we  will  sometimes 
denote  agents  with  letters  like  p  and  q).  Loosely  speaking,  an  interaction  of 
these  agents  is  uniquely  determined  by  the  sequence  of  global  states  through 
which  the  system  passes  in  the  course  of  the  interaction.  Formally,  a  global 
state  is  an  (n  +  l)-tupie  (a*,  si, . . . , Sn)  of  local  states,  where  a,-  is  the  local 
state  of  agent  p,-  (also  called  pj’s  view)  and  Se  is  the  state  of  the  environment. 

Much  of  this  chapter’s  presentation  comes  £com  joint  work  with  Yoram  Moses  [MT86, 
MT88],  which  was  in  turn  patterned  after  [HM84,  DM86].  Although  the  notion  of  an 
indezical  set  was  first  defined  in  [MT86,  MT88],  the  basic  ideas  used  in  the  proofs  in  this 
chapter  have  appeared  elsewhere  [HM84,  HM85]. 
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Intuitively,  the  state  of  the  environment  is  intended  to  capture  everything 
relevant  to  the  state  of  the  system  that  cannot  be  deduced  from  the  agents’ 
local  states.  In  a  message-passing  system,  for  example,  the  state  of  the 
environment  might  include  a  message  buffer  for  each  processor  in  the  system, 
containing  the  messages  sent  to  the  processor  but  not  yet  delivered.  A  run  is 
an  infinite  sequence  of  global  states;  numbering  the  states  from  0  to  infinity, 
we  think  of  the  fcth  global  state  as  the  global  state  at  time  k.  Intuitively,  a 
run  is  a  complete  description  of  one  possible  interaction  of  the  system  agents. 
A  system  is  simply  a  set  of  runs,  describing  the  set  of  all  possible  interactions 
of  the  system  agents  (possibly  the  set  of  all  possible  executions  of  a  given 
protocol,  for  example).  We  denote  the  global  state  at  time  k  in  run  r  by  r(fc), 
the  local  state  of  Pi  in  r{k)  by  ri{k)  (when  denoting  pi  by  g,  we  denote  g’s 
local  state  by  r,(fc)),  and  the  state  of  the  environment  in  r(fc)  by  re{k).  We 
refer  to  the  ordered  priir  (r,  k)  consisting  of  a  run  r  and  a  time  as  a  point 
We  say  that  a  point  (r,  k)  is  a  point  of  a  system  7?.  iff  r  is  a  run  of  TZ,  and  we 
frequently  abuse  notation  and  write  (r,  k)  SlZio  denote  the  fact  that  (r,  k) 
is  a  point  of  TZ.  Finally,  for  notational  convenience,  we  often  denote  points 
with  letters  like  c  or  d. 

We  typically  assume  that  all  agents  in  a  system  are  following  some  sort 
of  protocol  which,  roughly  speaking,  determines  an  agent’s  behavior  as  some 
function  of  its  local  state.  This  assumption  is  particularly  important  in  Chap¬ 
ters  3  and  5.  Since  the  systems  considered  in  these  chapters  are  synchronous, 
we  now  give  a  general  definition  of  a  protocol  in  a  synchronous  system  which 
will  be  refined  later  in  these  chapters. 

To  motivate  the  definition  of  a  protocol,  consider  the  following  informal 
description  of  computation  in  a  synchronous  system  of  agents  following  a 
protocol  P.  Computation  begins  in  an  initial  state  at  time  0  and  proceeds  in 
a  sequence  of  rounds,  with  round  k  lasting  from  time  k  through  k  +  1  (time 
k  is  considered  to  be  part  of  the  preceding  round  k  —  1).  Round  k  consists 
of  three  phases.  First,  each  agent  performs  some  action  (such  as  deciding 
on  an  output  value)  and  sends  messages  to  other  agents  in  the  system  as 
determined  by  the  protocol  P  and  its  local  state  at  time  A:  —  1.  Next,  each 
agent  receives  all  messages  sent  to  it  during  round  k  by  other  agents  in  the 
system.  Finally,  each  agent  changes  its  local  state  as  determined  by  the 
protocol  P,  its  local  state  at  time  —  1,  and  the  messages  it  received  during 
round  k. 

Formally,  therefore,  a  protocol  is  a  tuple  of  local  protocols,  one  for  each 
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agent.  A  local  protocol  foi  an  agent  consists  of  three  components:  a  function 
called  an  action  protocol  that  maps  a  local  state  to  an  action  a,  where  a  is 
intuitively  the  action  the  agent  is  to  perform  in  the  local  state;  a  function 
called  a  message  protocol  that  maps  a  local  state  to  a  list  of 

messages,  where  mj  is  intuitively  the  message  to  be  sent  to  pi  in  the  local 
state;  and  a  function  called  a  state  protocol  that  maps  a  local  state  and  a  list 
mi, . . .  ,mn  of  n  messages  to  another  local  state,  where  m{  is  intuitively  the 
message  just  received  from  p,-.  A  protocol  is  a  deterministic  protocol  if  these 
functions  are  deterministic,  and  a  probabilistic  protocol  if  these  functions  are 
probabilistic.  We  implicitly  associate  with  a  protocol  a  collection  of  global 
states  called  initial  states. 

A  run  r  of  a  protocol  P,  sketched  informally  above,  can  be  captured  in 
terms  of  our  formal  definition  of  a  run  as  follows.  The  global  state  of  r 
at  time  0  is  an  initial  state.  The  local  state  of  agent  pi  at  time  k  >  0  is 
determined  as  follows:  first,  for  each  agent  pj,  apply  p/s  message  protocol 
to  its  local  state  at  time  ib  —  1  to  determine  what  message  pj  sends  to  p,* 
during  round  k,  and  then  apply  p^’s  state  protocol  to  its  local  state  at  time 
k  —  1  and  this  set  of  messages  to  determine  p^’s  local  state  at  time  k.  It  is 
technically  convenient  to  assume  that  the  state  of  the  environment  at  each 
time  k>0  encodes  the  protocol  P  and  the  history  of  the  run  through  time 
ft,  where  the  history  is  a  list  of  ft  +  1  n-tuples  giving  the  local  state  of  each 
processor  at  each  time  from  0  through  ft.  Given  a  protocol  P  and  a  run  r  as 
defined  above,  we  say  that  r  is  a  run  of  P.  We  note,  however,  that  in  later 
chapters  it  will  be  necessary  to  elaborate  this  definition  of  a  run  of  P.  For 
example,  in  Chapter  3  agents  will  be  able  to  receive  messages  &om  sources 
outside  the  system  in  addition  to  agents  within  the  system.  Furthermore,  in 
that  chapter  we  vrill  consider  unreliable  systems  in  which  some  messages  may 
fail  to  be  delivered,  meaning  that  the  global  state  at  time  ft  is  not  necessarily 
uniquely  determined  by  the  global  state  at  time  ft  —  1  as  defined  above. 


2.2  Definition  of  Knowledge 

Having  defined  a  system  of  agents,  let  us  fix  a  given  system  71  for  the  re¬ 
mainder  of  this  section.  We  are  now  in  a  position  to  say  what  it  means  for 
an  agent  of  71  to  know  that  a  given  fact  is  true. 

Before  we  do  so,  however,  we  must  say  what  we  mean  by  a  fact.  Infor- 
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maily,  a  fact  is  an  assertion  that  is  either  true  or  false  at  a  point.  Formally, 
we  identify  a  fact  <p  with  a  set  of  points  of  TZ,  intuitively  the  set  of  points  at 
which  (p  is  true,  and  we  write  c  |=  ^  iff  ^  is  true  at  c. 

The  basic  intuition  behind  the  definition  of  knowledge  [HM84]  is  that  piS 
local  state  at  c  captures  all  the  information  p,-  has  about  the  system  at  c. 
If  Pi  has  the  same  local  state  at  two  points  c  and  d,  then  at  point  c  agent 
Pi  cannot  distinguish  between  c  and  d  and  must  consider  both  as  possible 
candidates  for  the  current  point.  If  a  fact  (p  is  true  at  c  but  false  at  d,  then 
Pi  cannot  be  said  to  know  at  c  that  <p  is  true  since  it  is  possible,  from  p{’s 
point  of  view,  that  the  current  point  is  actually  d  where  (p  is  false,  and  not 
c.  This  intuition  leads  us  to  say  that  p,-  considers  d  possible  at  c  if  pi  has 
the  same  local  state  at  c  and  d  (that  is,  pi  considers  (r',  k')  possible  at  (r,  k) 
if  ri{k)  =  rKk')),  and  that  pi  knows  a  fact  y?  at  c  if  is  true  at  all  points 
Pi  considers  possible  at  c.  In  other  words,  p;  knows  ^  iff  v?  is  guaranteed 
to  hold,  given  the  information  recorded  in  pi’s  local  state.  We  denote  “pi 
considers  d  possible  at  c”  by  c  d,  and  “pf  knows  <p  at  c”  by  c  |=  Ki<p.  It 
follows  that 

c  1=  Kitp  iff  d  1=  9?  for  all  d  6  7^  satisfying  c  d. 

Notice  that  p{’s  knowledge  depends  on  the  system  %,  since  %  restricts  the 
set  of  points  p,-  considers  possible.  Typically,  however,  the  system  will  be 
clear  from  context.  When  the  system  is  not  clear  from  context,  we  write 
7^,  c  1=  Ki(p  instead  of  c  |=  Kitp. 

Many  times  we  are  interested  in  the  knowledge  not  just  of  an  individual 
agent,  but  groups  of  agents.  A  straightforward  generalization  of  an  individ¬ 
ual  agent’s  knowledge  is  implicit  knowledge  [HM84]  (also  called  distributed 
knowledge).  The  intuition  here  is  that,  just  as  an  individual  agent  considers 
many  points  possible  at  c,  a  group  of  agents  pooling  together  all  the  infor¬ 
mation  they  have  about  the  system  may  also  consider  a  number  of  different 
points  possible;  and  just  as  the  individual  agent  knows  ^  if  ^  holds  at  all 
points  it  considers  possible,  the  group  of  agents  implicitly  knows  ^  if  ^  holds 
at  all  points  the  group  jointly  considers  possible.  Formally,  we  define  the 
joint  view  of  a  group  G  of  agents  at  a  point  (r,  k)  by 

^G{k)  {{Pi,r<(A)) :  Pi  e  G} . 

Roughly  speaking,  G’s  view  is  simply  the  joint  view  of  its  members.  We  note 
that  it  is  important  to  take  this  joint  view  to  be  ordered  pairs  of  the  form 
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{pijri{k))  since  we  have  not  said  an  agent’s  local  state  contains  its  identity, 
and  we  want  roik)  =  rQ{k')  to  mean  every  agent  in  G  has  the  same  local 
state  in  r{k)  and  r'(fc').  We  say  a  group  G  considers  a  point  d  possible  at 
c  if  every  agent  in  G  considers  d  possible  at  c;  that  is,  G  considers  (r',  k') 
possible  at  (r,  k)  iff  roik)  =  rQ{k*).  We  denote  “(7  considers  d  possible  at 
c”  by  c  d,  and  “G  implicitly  knows  <p  at  c”  by  c  [=  Ig<P-  We  define  G’s 
implicit  knowledge  by 

c  [=  Ig<P  iff  d  [=  y)  for  all  d  €  7^  satisfying  c  d. 

Intuitively,  G  implicitly  knows  (p  if  the  joint  view  of  G’s  members  guarantees 
that  <p  holds.  If  pi  knows  ij)  and  pj  knows  ip  D  ip,  for  example,  then  together 
they  implicitly  know  p,  even  if  neither  of  them  knows  p  individually. 

With  these  definitions  we  can  make  formal  sense  of  statement  such  as  “pi 
knows  p,”  but  we  can  sdso  make  sense  of  statements  such  as  “p,-  knows  pi 
knows  p”  involving  multiple  levels  of  knowledge.  Continuing  in  this  way,  we 
reach  in  the  limit  the  state  of  common  knowledge  [HM84].  Roughly  speaking, 
a  fact  p  is  common  knowledge  to  a  group  of  agents  if  everyone  in  the  group 
knows  py  everyone  knows  everyone  knows  py  and  so  on  ad  infinitum.  The 
state  of  common  knowledge  will  be  central  to  our  analysis  in  Chapter  3. 
Its  central  role  will  result  from  the  close  correspondence  between  common 
knowledge  among  the  members  of  a  group  of  processors  in  a  distributed 
system  and  the  simultaneous  performance  of  em  action  by  members  of  this 
group. 

The  first  step  in  defining  common  knowledge  is  to  define  what  it  means 
for  everyone  in  a  group  to  know  a  fact.  For  a  fixed  group  G  of  agents,  the 
standard  definition  [HM84]  of  everyone  in  G  knows  p  is  given  by 

/\  Kiip. 

PiGG 

The  definition  [HM84]  oip  is  common  knowledge  to  Gy  therefore,  is  given  by 


CaP  EgP  a  EaEaP  A  •  •  •  A  E^p  A  •  •  • . 

Here  we  define  E^p  inductively  by  E°p  =  p  and  E^p  =  Ea{Eg~^p)  for 
k  >  1.  In  other  words,*  c  \=  Cap  iff  c  }=  E’^p  for  all  A;  >  1.  Thus,  roughly 
speaking,  a  fact  is  common  knowledge  if  everyone  knows  it,  everyone  knows 
that  everyone  knows  it,  and  so  on  ad  infinitum. 
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In  practice,  however,  the  group  of  interest  will  not  be  a  jfixed  set  of  agents. 
For  example,  in  Chapter  3  we  will  be  most  interested  in  facts  that  are  common 
knowledge  to  the  group  Af  of  nonfaulty  processors.  The  precise  meaning  of  a 
nonfaulty  processor  is  not  important  here,  so  we  do  not  define  Af  formally  at 
this  point;  simply  observe  that  a  processor  may  be  considered  faulty  at  some 
points  and  not  at  others,  and  hence  that  the  set  of  “nonfaulty  processors”  is 
not  a  constant,  fixed  set  of  processors  but  varies  from  point  to  point.  This 
motivates  the  definition  of  common  knowledge  to  a  slightly  more  general 
notion  of  groups  of  agents.  An  indexical  set  S  of  agents  is  a  function  mapping 
points  to  sets  of  agents  (meaning  iS  is  a  set  whose  value  is  indexed  by  points, 
so  to  speak).  That  is,  5  :  c  i-»  «S(c),  where  «S(c)  is  a  set  of  agents.  The  notion 
of  an  indexical  set  is  a  direct  generalization  of  the  notion  of  a  fixed  set  of 
agents.  In  particular,  we  can  identify  a  fixed  set  of  agents  with  a  constant 
indexical  set.  The  group  Af  of  nonfaulty  processors,  the  group  P  of  all 
processors,  the  group  of  all  processors  that  haven’t  displayed  faulty  behavior 
by  the  current  time,  and  many  other  groups  of  interest  are  all  indexical 
sets  of  processors.  In  practice,  each  of  these  indexical  sets  is  nonempty.  For 
example,  since  it  is  common  in  the  literature  to  assume  that  the  upper  bound 
on  the  number  of  faulty  processors  to  be  tolerated  is  strictly  less  that  the 
number  of  processors  in  the  system,  the  set  of  nonfaulty  processors  is  always 
nonempty.  Formally,  an  indexical  set  S  is  nonempty  (in  a  given  system  It) 
if  5(c)  is  nonempty  for  every  point  c  of  "K.  For  technical  convenience,  we 
restrict  our  attention  to  nonempty  indexical  sets. 

The  first  step  in  defining  what  it  meiuis  for  a  fact  (p  to  be  common  knowl¬ 
edge  to  agents  in  an  indexical  set  is  to  define  what  it  means  for  everyone  in 
the  indexical  set  to  know  <p.  In  extending  the  standard  definition  to  indexi¬ 
cal  sets,  a  subtle  decision  must  be  made.  The  immediate  generalization  is  to 
define 

js.v’ =  A 

Pi^S 

This  means  that  c  \=  Es^  iff  c  [=  Kitp  for  every  pi  6  5(c).  This  general¬ 
ization,  however,  does  not  capture  a  subtle  aspect  of  agents’  knowledge  in 
unreliable  systems.  Consider,  for  example,  a  system  with  some  action  a  in 
which  it  is  guaranteed  that  all  nonfaulty  processors  perform  a  simultaneously 
whenever  any  nonfaulty  processor  does  so.  (Again,  the  precise  definition  of 
a  nonfaulty  processor  is  not  important  here.)  Suppose  the  nonfaulty  proces¬ 
sors  perform  a  at  a  point  c.  It  seems  reasonable  to  expect  that  at  the  point 
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c  all  nonfaulty  processors  know  that  all  nonfaulty  processors  are  performing 
o;  in  other  words,  c  \=  Eu<p  where  tp  is  the  fact  “all  nonfaulty  processors 
are  performing  a.”  The  reasoning  is  as  follows:  each  nonfaulty  processor  is 
performing  a,  so  each  nonfaulty  processor  knows  a  is  being  performed  by  a 
nonfa’  'ty  processor;  and  since  a  is  guaranteed  to  be  performed  simultane¬ 
ously  oy  all  nonfaulty  processors  whenever  it  is  performed  by  any  nonfaulty 
processor,  each  nonfault}*^  processor  knows  all  nonfaulty  processors  are  per¬ 
forming  a.  This  line  of  reasoning,  however,  depends  on  a  nonfaulty  processor 
knowing  it  is  a  nonfaulty  processor,  which  need  not  be  the  case  (and  it  cer¬ 
tainly  won’t  be  the  case  in  Chapter  3).  The  only  thing  a  nonfaulty  processor 
really  knows  at  the  point  c  is  that  if  it  is  nonfaulty,  then  the  action  a  is  being 
performed  by  all  nonfaulty  processors. 

While  it  is  possible  for  a  nonfaulty  processor  to  be  a  member  of  the 
indexical  set  Af  without  knowing  it  is  a  member  of  Af,  it  is  not  hard  to  see 
that  for  any  fixed  (or  constant)  set  G,  an  agent  is  a  member  of  G  iff  it  knows 
it  is  a-  member  of  G.  This  follows  directly  from  the  definition  of  knowledge, 
since  if  pi  €  G,  then  Pi€  G  holds  at  all  points  (and  in  particular  at  all  points 
Pi  considers  possible),  and  hence  p<  knows  Pi  €  G.  Similarly,  given  an  agent 
Pi  G  G,  it  is  not  hard  to  see  that  pi  knows  <p  iff  pi  knows  {pi  €  G)  D  (p:  if 
Pi  knows  (p,  then  p,  and  hence  (pi  €  G)  D  p,  holds  at  all  points  Pi  considers 
possible,  and  therefore  p,-  knows  (pi  €  G)  D  p.  An  equivalent  definition  of 
EcP,  therefore,  is 

E.,V=  A  JTifeSG  Dr), 

Pi€G 

which  says  that  Eop  holds  iff  each  agent  in  G  knows  that,  if  it  is  a  member  of 
G,  then  p  holds.  We  choose  this  form  of  “everyone  knows”  as  the  appropriate 
form  to  generalize  to  indexical  sets.  Formally,  we  define  EsP  by 

B,r=  AK,[p,eS  D  r). 

Pi  €5 


We  now  define  CsP  by 

CsP  ^  FjsP  a  EsEsP  a  '  •  •  a  E'^p  a  •  •  • . 

These  definitions  of  Es  and  Cs  directly  generalize  the  standard  definitions 
from  [HM84]  and  [DM86]. 
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A  useful  tool  for  thinking  about  E^(p  and  Cs<P  is  the  aimilarity  graph 
{relative  to  S).  This  is  an  undirected  graph  whose  nodes  are  the  poin+s  of 
the  system,  and  whose  edges  are  defined  as  follows:  two  points  c  and  are 
connected  by  an  edge  iff  some  agent  p,-  that  is  a  member  of  both  iS(c)  and 
S{d)  has  the  same  local  state  at  both  c  and  d  (that  is,  c  d).  For  example, 
if  S  is  the  set  Af  of  nonfaulty  processors,  two  points  are  connected  by  an  edge 
in  the  similarity  graph  iff  there  is  a  processor  that  is  nonfaulty  at  both  points, 
and  has  the  same  local  state  at  both  points.  The  property  of  the  similarity 
graph  making  is  such  a  useful  tool  is  that  its  connected  components  essential 
characterize  what  facts  are  common  knowledge  at  any  given  point.  To  see 
this,  we  first  note  that  an  easy  argument  by  induction  on  k  shows  that 

Proposition  2.1:  c  |=  E^(p  iff  d  |=  ^  for  all  points  d  of  distance  at  most  k 
from  c  in  the  similarity  graph  relative  to  S. 

Proof:  We  proceed  by  induction  on  fc.  The  induction  hypothesis  clearly 
holds  for  the  case  of  fc  =  0  since  E°tp  =  iphy  definition. 

Consider  the  case  of  A  =  1.  (Our  previous  restriction  to  nonempty  index- 
ical  sets  is  crucial  here.)  Suppose  c  |=  Egifi.  If  d  is  of  distance  at  most  1  from 

c,  then  some  Pi  in  both  <S(c)  and  S{d)  has  the  same  local  state  at  both  c  and 

d.  Since  Pi  €  S{c)  and  c  \=  Es<p,  we  have  c  |=  Ki{pi  €  S  D  <p).  Since  d  c, 
we  have  d  [=  (pi  €  «S)  D  if]  and  since  p;  6  «5(d),  we  have  d  [=  ^.  It  follows 
that  d  1=  yj  for  all  d  of  distance  at  most  1  from  c.  Suppose,  conversely,  that 
d  1=  ^  for  all  d  of  distance  at  most  1  from  c.  Suppose  pi  S  «5(c),  and  suppose 
Pi  has  the  same  locad  state  at  both  c  and  d.  If  Pi  €  S{d),  then  d  is  of  distance 
at  most  1  from  c  in  the  graph,  so  d  \=  tp  and  hence  d  |=  (pi  G  5)  D  If 
Pi  ^  5(d),  then  clearly  d  [=  (p,-  €  5)  D  y?.  Since  this  statement  holds  for  all 
points  d  c,  we  have  c  |=  Ki{pi  €  5  D  y?);  and  since  this  is  true  for  aill 
Pi  G  5(c),  we  have  c  |=  Es<p. 

For  fc  >  1,  suppose  the  inductive  hypothesis  holds  for  k  —  1.  Notice  that 
c  1=  Ej^if  iff  c  [=  Es{Es~^(p).  By  the  induction  hypothesis,  c  |=  Es{Es~^tp) 
iff  d  !=  Eg~^<p  for  all  d  of  distance  at  most  1  from  c,  and  d  [=  Eg~^ip  iff  e  |=  yj 
for  all  e  of  distance  at  most  k  —  1  from  d.  It  follows  that  c  |=  E^tp  iff  e  [=  y> 
for  all  e  of  distance  at  most  k  from  c.  □ 

Finally,  since  c  \=  Cs<p  iff  c  [=  £J*y?  for  all  fc  >  1,  it  follows  that 

Proposition  2.2:  c  )=  Cs<p  iff  d  |=  y>  for  all  points  d  in  c’s  connected  com¬ 
ponent  in  the  similarity  graph  relative  to  5. 
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Two  points  c  and  d  are  said  to  be  similar  {relative  to  S),  which  we  denote 
by  c  ~  d,  if  they  are  in  the  same  connected  component  of  the  similarity  graph 
relative  to  S.  Since  the  indexical  set  S  is  generally  clear  from  context  (in 
Chapter  3  most  often  being  the  set  Af  of  nonfaulty  processors),  we  denote 
similarity  by  ~  without  the  superscript  S.  We  thus  have: 

Theorem  2.3:  c  |=  Gs<p  iff  d  |=  for  all  d  satisfsring  c  d. 

Our  analysis  in  Chapter  3  will  exploit  this  relationship  between  common 
knowledge  and  the  similarity  graph.  The  similarity  graph  will  provide  us 
with  a  useful  combinatorial  tool  with  which  to  study  when  facts  become 
common  knowledge. 


2.3  Logic  of  Knowledge 

We  remark  at  this  point  that  the  definitions  of  knowledge  and  common  knowl¬ 
edge  we  have  given  have  been  purely  semantic  definitions.  We  have  talked 
about  agents  knowing  facts,  but  we  have  not  said  where  these  facts  come 
from  other  than  to  say  that  each  fact  corresponds  to  a  set  of  points  in  a  sys¬ 
tem.  It  is  often  convenient  to  have  a  formal,  logical  language  of  knowledge 
amd  common  knowledge  in  which  we  can  make  statements  about  an  agent’s 
knowledge.  We  now  show  how  to  define  such  a  language. 

Let  $  be  some  arbitrary  set  of  primitive  propositions.  Intuitively,  these 
propositions  are  statements  about  points  in  the  system  that  do  not  make 
explicit  mention  of  am  agent’s  knowledge,  statements  such  as  “the  value  of 
register  x  is  0”  oi  “processor  pi  failed  in  round  3.”  Let  £($)  be  the  language 
obtauned  by  closing  $  under  the  standaurd  boolean  connectives  (conjunction 
and  negation)  and  the  knowledge  operators  of  the  form  Ki,  Iq,  Est  Eg,  amd 
Cg  (one  might  also  consider  adding  some  of  the  standard  modal  operators 
from  linear-time  temporal  logic  such  as  □  and  O).  In  other  words,  /^($)  is 
the  smallest  language  with  the  property  that  if  <p  amd  are  contained  in 
£($),  then  so  are  ip  /\ip,  -itp,  Kip,  etc.  Strings  in  the  language  £($)  aie 
called  formulas.  We  use  pDtj}  as  a  shorthamd  for  meaning  that  the 

truth  of  (p  implies  the  truth  of  tb. 

So  far,  the  formulas  in  /^($)  are  just  strings  in  a  language  with  no  intrinsic 
meaining  in  themselves.  In  order  to  give  these  formulas  meaning  in  a  system 
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7^,  we  require  a  truth  assignment  r  that  maps  each  of  the  primitive  proposi¬ 
tions  y)  €  $  to  the  set  of  points  r(vj)  of  H  at  which  (p  is  true.  Given  such  a 
truth  assignment,  the  truth  of  an  arbitrary  formula  tp  E  £($)  is  defined  by 
induction  on  the  structure  of  <p  using  the  definitions  given  above: 


c\=p 

iff 

c  €  t{p)  whenever  p 

c\=p  Alj) 

iff 

c\=p  and  c  1= 

c\=-yp 

iff 

c\jkp 

c  1=  KiP 

iff 

d  1=  yj  for  all  d  c 

c  h  lap 

iff 

d  1=  ^  for  all  d  c 

c  1=  EsP 

iff 

d  ^  whenever  d'^iC  and  pi  E  «S(c) 

c  1=  E^p 

iff 

c  1=  Es{Es~^p)  whenever  k>  1 

c  h  CsP 

iff 

c  [=  EsP  for  all  A  >  1 

We  assume  that  associated  with  every  system  is  a  truth  assignment  tti 
determining  for  every  primitive  proposition  in  $  the  points  of  TZ  at  which 
the  proposition  is  true.  From  this  assumption  it  follows  that  every  formula 
in  our  language  corresponds  to  the  set  of  points  of  TZ  at  which  the  formula 
is  true.  Thus  it  follows  that  every  formula  in  our  language  corresponds  to 
a  fact,  a  set  of  points  of  TZ,  as  previously  defined.  For  this  reason  we  will 
sometimes  abuse  terminology  and  use  the  word  “fact”  in  place  of  “formula.” 
We  remuk  that  in  later  chapters  we  will  be  adding  more  knowledge  operators 
to  our  language  £($)  as  we  refine  our  definitions  of  knowledge. 

Finally,  a  formula  p  is  said  to  be  valid  in  the  system  7Z,  which  we  denote 
by  7^  1=  yj,  if  9?  is  true  at  all  points  of  TZ  (as  determined  by  the  system’s 
truth  assignment  r-n).  A  formula  p  is  said  to  be  valid,  which  we  denote  by 
\=p,i{p  is  vjilid  in  the  system  TZ  for  all  systems  TZ. 


2.4  Properties  of  Knowledge 


The  notions  of  knowledge,  implicit  knowledge,  and  common  knowledge  de¬ 
fined  above  are  closely  related  to  modal  logics  that  have  been  extensively 
studied  by  philosophers  (see  [Hin62]).  A  modal  operator  M  is  said  to  have 


'Notice  iiiai  ilils  is  equivalent  to  deuning  c 


IP  iff  ^ 


c. 


^  »ll 

^  r/ 


Pi  6  5(c).  The  advantage  to  onr  definition  is  we  do  not  have  to  worry  about  whether 
Pi  €  5,  and  hence  Pi  €  5  D  p,  is  a  formula  in  our  language. 
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the  properties  of  the  modal  system  S5  if  the  following  inference  rule  is  satisfied 
for  every  system  TZ 

1.  if  (f  is  valid  in  the  system  TZ  then  M<p  is  valid  in  the  system  TZ 
and  the  following  formulas  are  valid 

2.  Mtp  D  yj, 

3.  (My>  A  Af(v?  D  V*))  3  Mij), 

4.  M(p  D  MMyj,  and 
6.  -'M(p  D  M-'Mip. 

If  we  take  M  to  be  the  knowledge  operator  if,-,  these  statements  may  be 
interpreted  as  follows:  the  first  statement  says  an  agent  knows  all  facts  tp 
that  are  necessarily  true;  the  second  says  an  agent  can  know  only  true  facts, 
since  it  says  that  if  an  agent  knows  <p  then  must  be  true;  the  third  says 
an  agent  knows  all  consequences  of  its  knowledge,  since  if  it  knows  both  f 
and  then  it  also  knows  if)\  the  fourth  says  that  an  agent  knows  what 

it  knows,  since  if  an  agent  knows  (p,  then  it  knows  that  it  knows  (p;  and  the 
fifth  says  that  an  agent  knows  what  it  doesn’t  know,  since  if  an  agent  does 
not  know  y>,  then  it  knows  it  does  not  know  <p.  It  is  not  hard  to  show  that 
the  definitions  of  knowledge,  implicit  knowledge,  and  common  knowledge  as 
given  above  immediately  implies  the  following  (cf.  [HM85,  DM86]): 

Proposition  2.4;  The  operators  Ki,  /©,  and  Cs  have  the  properties  of  the 
modal  system  S5. 

Proof:  We  sketch  the  proof  for  the  knowledge  operator  Ki^  and  leave  the 
remaining  operators  for  the  reader. 

1.  Suppose  yj  is  valid  in  the  system  TZ.  For  any  point  c  of  TZ,  since  tp  is 
valid  in  the  system  it  follows  that  d  |=  ^  for  all  points  d  c,  and 
hence  that  c  |=  Kup.  Since  c  |=  Ki^p  for  any  point  c  of  %,  Kitp  is  valid 
in  the  system  TZ. 

Let  c  be  an  arbitrary  point  of  an  arbitrary  system  TZ. 

2.  If  c  1=  Kiip,  then  d  |=  y?  for  all  d  c,  and  in  particular  c  |=  y>. 
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3.  If  c  1=  Kitp  and  c  |=  Ki{(p  D  ^),  then  d  \=  tp  and  d  |=  9?  D  V*  for  all 
d  ~i  c.  It  follows  that  d  |=  for  all  d  c,  and  hence  that  c  |=  Kiiff. 

4.  Suppose  c  1=  Ki(p,  and  suppose  d  c.  Notice  that  e  d  implies 

e  c,  since  e  d  eind  d  c  imply  that  p;  has  the  same  local  state  at 
edl  three  points.  Since  c  |=  Ki<p  implies  e  |=  ^  for  all  e  c,  it  follows 
that  e  1=  for  all  e  d,  and  hence  that  d  \=  Kip.  Since  this  is  true 

for  all  d  c,  it  follows  that  c  |=  KiKi^p 

5.  Suppose  c  1=  ->Kip,  and  suppose  d  c.  Since  c  |=  ->Kiip,  we  have 

e  \/=  (p  ioT  some  e  c.  But,  as  above,  e  '^i  d  and  hence  d  t=  ~'Ki(p. 
Since  this  is  true  for  all  d  c,  it  follows  that  c  |=  Ki-'Kip.  □ 

In  addition  to  the  properties  of  S4,  common  knowledge  satisfies  two  addi¬ 
tional  properties  that  will  prove  essential  to  our  analysis  in  Chapter  3.  One 
of  these  useful  properties  is  the  so-called  fixed  point  axiom 


CsP  =  Es{p  A  CsP)- 


or 

CsP  s  Es{Csp) 

which  states  that  common  knowledge  is  a  fixed  point  of  the  Eg  operator.* 
It  implies  that  a  fact’s  being  common  knowledge  is  in  a  sense  “public:”  a 
fact  can  be  common  knowledge  to  a  group  of  agents  only  if  all  members  of 
the  group  know  that  it  is  common  knowledge.  This  axiom  also  implies  that 
when  a  fact  becomes  common  knowledge,  it  becomes  common  knowledge 
to  all  relevant  agents  simultaneously.  The  proof  that  common  knowledge 
satisfies  this  fixed  point  axiom  is  instructive: 

Proposition  2.5:  The  fixed  point  axiom  ‘^CsP  =  Es{p  A  Csp)”  is  valid. 

Proof:  Suppose  c  |=  CsP  for  some  arbitrary  point  c  of  some  arbitrary  system 
TZ.  This  means  c  1=  Eg'^^p  for  all  >  0.  Since  E^'^^p  =  Es(Egp),  for  every 
d  adjacent  to  c  we  have  d  |=  E^p  for  all  A:  >  0  by  Proposition  2.1,  and  hence 


^The  two  versions  of  the  fixed  point  axiom  turn  out  to  be  equivalent.  The  first  version  of 
the  axiom  generalizes  more  easily  to  variants  of  common  knowledge  considered  in  [HM84]. 


2.4.  PROPERTIES  OF  KNOWLEDGE 


33 


for  every  d  adjacent  to  c  we  have  both  d\=tp  and  d  |=  Gs>p  (remember  that 
E^ip  =  by  definition).  It  follows  that  c  j=  Es{<^  A  Cs'p)- 

Suppose,  conversely,  c  |=  Es{}p  A  Gs^)^  By  Proposition  2.1,  this  means 
d  1=  V’  A  Gs^  for  all  points  d  for  distance  at  most  1  from  c,  and  in  particular 
that  c  1=  A  Gs<p,  so  c  1=  Gs<p  as  desired.  □ 

The  second  useful  property  of  common  knowledge  is  captured  by  the 
following  induction  rule: 

If  <fi  D  Es<p  is  valid  in  the  system  7^, 

then  tp  D  Gs<f  is  valid  in  the  system  %. 

Roughly  speaking,  the  induction  rule  implies  that  if  a  fact  is  “public”  to  a 
group  of  processors,  in  the  sense  that  whenever  it  holds  it  is  known  to  all 
members  of  the  group,  then  whenever  it  holds  it  is  in  fact  common  knowledge. 

Proposition  2.6:  The  induction  rule  “if  <p  D  Es<p  is  valid  in  the  system, 
then  (p  D  Gs'P  is  valid  in  the  system”  is  sound. 

Proof:  Suppose  p  D  EsP  is  valid  in  the  system  for  some  arbitrary  system 
It.  To  prove  p  D  GsP  is  valid  in  the  system  Tty  we  assume  c  |=  and 
show  c  1=  GsP-  It  is  enough  to  show  c  |=  E^p  for  aU  fc  >  0.  We  proceed 
by  induction  on  k.  For  fc  =  0,  the  fact  that  c\=  p  and  that  F^p  =  y?  by 
definition  imply  c  \=  E^p.  For  A  >  0,  suppose  the  inductive  hypothesis  holds 
for  A  —  1.  By  the  induction  hypothesis  for  A  —  1  we  have  c  \=  Eg~^p,  so 
Proposition  2.1  guarantees  d  |=  y)  for  all  points  d  of  distance  at  most  A  —  1. 
Since,  however,  p  D  EsP  is  valid  in  the  system,  we  have  d  1=  EsP,  and 
Proposition  2.1  guarantees  e  |=  v?  for  all  e  of  distance  at  most  1  from  d. 
But  this  means  e  [=  for  all  e  of  distance  at  most  A  from  c,  and  hence  by 
Proposition  2.1  that  c  |=  E^p  as  desired.  □ 

In  the  remainder  of  this  work,  the  notions  of  knowledge,  implicit  knowl¬ 
edge,  and  common  knowledge  together  with  their  properties  proven  in  this 
section  will  be  fundamental  to  our  study  of  problems  in  distributed  comput¬ 
ing. 

We  end  this  chapter  with  a  short  discussion  of  the  formulas  or  facts  an 
agent  is  said  to  know.  According  to  our  definitions,  facts  are  properties  of 
points:  they  are  either  true  or  false  at  any  given  point.  While  facts  are  said 
to  be  true  or  false  of  points,  many  times  the  truth  of  a  fact  is  determined 
by  some  simple  property  of  a  point.  Many  times,  for  example,  the  truth  at 
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a  point  of  a  fact  like  “the  last  coin  flipped  landed  heads”  is  determined 
simply  by  the  point’s  global  state:  given  two  points  with  the  siime  global 
state,  the  fact  is  either  true  at  both  points  or  false  at  both  points.  Other 
times,  the  truth  of  a  fact  <p2  like  “all  coins  flipped  in  this  run  land  heads” 
is  determined  simply  by  the  run  at  the  current  point:  given  two  points  of 
the  same  run,  the  fact  is  either  true  at  both  points  or  false  at  both  points, 
depending  on  whether  all  coins  flipped  in  the  run  land  heads.  Notice  that  it 
is  possible  for  a  fact  like  <p2  to  be  true  at  one  point  (r,  k)  and  false  at  another 
point  (r'j  k),  even  though  they  have  the  same  global  state.  This  is  the  case, 
for  example,  if  all  coins  flipped  in  r  land  heads,  all  coins  flipped  in  r'  through 
time  k  land  heads,  and  all  coins  flipped  in  r'  after  time  k  land  tails. 

Given  a  system  %,  let  us  define  a  property  to  be  a  mapping  from  the 
points  of  Tl  into  some  range;  for  example,  mapping  from  a  point  (r,  k)  to  the 
global  state  r{k)  or  the  run  r.  Intuitively,  such  a  mapping  maps  a  point  to 
some  property  of  the  point  that  is  of  particular  interest.  Given  a  system  7Z 
and  a  property  P,  we  say  a  fact  y)  is  a  fact  about  P  if  fixing  the  value  of  P 
determines  the  truth  of  <p:  given  two  points  with  the  same  value  of  P,  the 
fact  (p  is  either  true  at  both  points  or  false  at  both  points.  For  example,  if 
we  assume  the  global  state  records  the  sequence  of  coins  flipped  so  far  in  a 
run  (perhaps  this  sequence  is  recorded, in  the  environment),  then  the  fact  (pi 
above  is  a  fact  about  the  global  state  since  the  truth  of  <pi  at  two  points  with 
the  same  global  state  is  the  same;  and  (p2  is  a  fact  about  the  run  since  the 
truth  of  (p2  at  two  points  of  the  same  run  is  the  same. 

Finally,  recall  our  comment  that  the  set  $  of  primitive  propositions  in  our 
language  C{^)  typically  consists  of  statements  about  the  system  that  make  no 
explicit  mention  of  the  agents’  knowledge.  In  particular,  it  is  common  to  take 
these  propositions  to  be  facts  about  the  global  state.  In  a  given  system  1Z, 
we  say  the  language  £($)  is  state-generated H  each  of  the  propositions  €  # 
is  a  fact  about  the  global  state.  This  means  the  primitive  propositions  (p  axe 
simply  statements  about  the  global  state  (which  we  view  as  a  particularly 
simple  but  fundamental  kind  of  statement),  and  not,  for  example,  about 
future  events  in  the  run. 


Chapter  3 

Programming  Simultaneous 
Actions  Using 
Common  Knowledge 


In  this  chapter,  we  show  how  thinking  about  distributed  computation  in 
terms  of  knowledge  can  aid  in  the  deoign  and  analysis  of  protocols  for  a 
number  of  problems  appearing  in  the  literature,  and  in  the  proof  of  nontrivial 
lower  bounds  on  the  complexity  of  solving  these  problems  in  certain  failure 
models. 


3.1  Introduction 

The  problem  of  ensuring  proper  coordination  between  processors  in  dis¬ 
tributed  systems  whose  components  are  unreliable  is  both  important  and 
difficult.  There  are  generally  two  aspects  to  such  coordination;  the  actions 
the  different  processors  perform,  and  the  relative  timing  of  these  actions. 
Both  aspects  are  crucial,  for  instance,  in  maintaining  consistent  views  of  a 
distributed  database.  In  particular,  it  is  often  most  desirable  to  perform  co¬ 
ordinated  actions  simultaneously  at  different  sites  of  a  system.  It  is  therefore 
of  great  interest  to  study  the  design  of  protocols  involving  simultaneous  ac- 

This  chapter  is  joint  work  with  Yorjun  Moses.  Earlier  versions  have  appeared  in  Pro¬ 
ceedings  of  the  27th  IEEE  Symposium  on  Foundations  of  Computer  Science  [MT86]  and 
Algorithmica  [MT88]. 
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tions,  actions  performed  simultaneously  by  all  processors  whenever  they  are 
performed  at  all. 

In  [DM86],  Dwork  and  Moses  study  the  design  of  protocols  for  simultane¬ 
ous  Byzantine  agreement  in  the  crash  failure  model,  a  failure  model  in  which 
a  processor  fails  by  simply  halting,  never  sending  any  message  in  any  round 
following  its  halting  round.  Their  analysis  focuses  on  determining  necessary 
and  sufficient  conditions  for  reaching  simultaneous  Byzantine  agreement  in 
terms  of  the  processors’  states  of  knowledge  about  the  system.  As  a  result  of 
this  analysis,  they  derive  a  protocol  for  simultaneous  Byzantine  agreement 
with  the  unique  property  of  being  optimal  in  all  runs]  that  is,  their  protocol 
halts  as  early  as  any  protocol  for  the  problem  could,  given  the  pattern  of 
faulty  processor  behavior  that  occurs.  In  contrast,  previous  protocols  do  not 
adapt  their  behavior  on  the  basis  of  faulty  processor  behavior,  and  hence 
always  perform  as  poorly  as  they  do  in  their  worst  case  run.  Implicit  in  the 
work  of  Dwork  and  Moses  is  a  general  method  for  obtaining  optimal  protocols 
for  many  problems  involving  simidtaneous  actions  in  the  crash  failure  model. 
Their  technical  analysis,  however,  makes  strong  use  of  particular  properties 
of  the  crash  failure  model,  and  does  not  extend  to  more  complicated  failure 
models. 

This  chapter  presents  a  novel  approach  to  the  design  of  fault-tolerant 
protocols  in  several  variants  of  the  more  complex  omissions  failure  model,  a 
failure  model  in  which  processors  fail  only  by  intermittently  failing  to  send 
some  of  the  messages  they  are  required  by  their  protocol  to  send,  but  do 
not  necessarily  halt  as  in  the  crash  model.  We  explicitly  define  a  large  class 
of  simultaneous  choice  problems,  a  class  intended  to  capture  the  essence  of 
simultaneous  coordination  in  synchronous  systems.  Many  well-known  prob¬ 
lems,  including  simultaneous  Byzantine  agreement  [PSL80,  Fis83,  DM86], 
distributed  firing  squad  [BL87,  CDDS85,  Rab],  etc.,  cem  be  formulated  as 
simultaneous  choice  problems.  As  the  result  of  a  delicate  knowledge-based 
analysis  in  these  failure  models,  we  derive  at  once  protocols  that  are  optimal 
in  all  runs  for  all  simultaneous  choice  problems:  Each  protocol  is  guaranteed 
to  perform  the  desired  simultaneous  actions  as  soon  as  any  protocol  for  the 
problem  could,  given  the  input  to  the  system  and  the  pattern  of  faulty  pro¬ 
cessor  behavior.  (We  will  use  optimal  as  shorthand  for  optimal  in  all  runs.) 
Thus,  we  show  how  a  knowledge-based  analysis  can  be  used  as  a  general  tool 
for  the  design  of  protocols  for  an  entire  class  of  problems.  Our  analysis  ap¬ 
plies  to  the  crash  failure  model  as  well,  and  formally  extends  the  statements 
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of  results  in  [DM86]  to  the  whole  class  of  simultaneous  choice  problems  (al¬ 
though  most  of  the  proof  techniques  we  use  are  quite  different  from  those  in 
[DM86]). 

Our  approach  is  based  on  the  close  relationship  between  knowledge,  com¬ 
munication,  and  action  in  distributed  systems:  A  number  of  recent  works  (see 
[HM84],  [DM86],  and  [Mos86])  show  that  simultaneous  actions  are  closely 
related  to  common  knowledge.  Recall  that,  informally,  a  fact  is  common 
knowledge  if  it  is  true,  everyone  knows  it,  everyone  knows  that  everyone 
knows  it,  and  so  on  ad  infinitum.  Notice  that  every  processor  performing  a 
simultaneous  action  knows  the  action  is  being  performed.  In  addition,  since 
such  actions  are  performed  simultaneously  by  all  processors,  every  processor 
knows  that  ciU  processors  know  the  action  is  being  performed.  This  aurgument 
can  be  (cind  will  be)  formalized  and  extended  to  show  that  when  a  simulta¬ 
neous  action  is  performed,  it  is  common  knowledge  that  the  action  is  being 
performed.  Consequently,  a  necessary  condition  for  performing  simultane¬ 
ous  actions  is  attaining  common  knowledge  of  particular  facts  (cf.  [HF85]). 
Interestingly,  our  work  shows  that  in  a  precise  sense  this  is  also  a  sufficient 
condition;  The  problem  of  performing  simtdtaneous  actions  reduces  to  the 
problem  of  attaining  common  knowledge  of  particular  facts. 

In  deriving  optimal  protocols  for  simultaneous  choice  problems,  we  make 
explicit  and  direct  use  of  the  relationship  between  common  knowledge  and 
simultaneous  actions.  The  derivation  proceeds  in  two  stages.  In  the  first 
stage,  we  program  the  optimal  protocols  in  a  high-level  language  where  pro¬ 
cessors’  actions  depend  on  explicit  tests  for  common  knowledge  of  certain 
facts.  These  high-level  protocols  are  extracted  directly  firom  the  problem 
specifications  via  a  few  simple  manipulations.  The  second  stage  deals  with 
effectively  implementing  these  tests  for  common  knowledge.  We  give  a  direct 
implementation  of  such  tests  in  all  variants  of  the  omissions  failure  model  we 
consider.  As  a  result,  our  high-level  protocols  have  effective  implementations 
in  these  failure  models  as  low-level,  standard  protocols  that  are  optimal  in 
all  runs. 

Consider,  for  example,  the  following  version  of  the  distributed  firing  squad 
problem  (cf.  [BL87,  CDDS85,  Rab]):  An  external  source  may  send  “start” 
signals  to  some  of  the  processors  in  the  system  at  unpredictable  times,  pos¬ 
sibly  different  times  for  different  processors.  It  is  required  that  (i)  if  any 
nonfaulty  processor  receives  a  “start”  signal,  then  all  nonfaulty  processors 
perform  an  irreversible  “firing”  action  at  some  later  point  (which  means  each 
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nonfaulty  processor  enters  some  distinguished  “firing”  state  it  never  leaves), 
(ii)  whenever  any  nonfaulty  processor  “fires,”  all  nonfaulty  processors  do  so 
simultaneously,  and  (iii)  if  no  processor  receives  a  “start”  signal,  then  no  non¬ 
faulty  processor  “fires.”  The  high-level  protocol  we  derive  for  this  problem 
in  the  omissions  model  requires  all  processors  to  act  as  follows: 

repeat  every  round 

send  current  local  state  to  every  processor 
until  it  is  common  knowledge  that 

some  processor  received  a  “start’*  signal; 

“lire”  and  halt. 

Since  we  exhibit  an  effective  implementation  of  the  test  for  common  knowl¬ 
edge  embedded  in  this  protocol,  this  high-level  protocol  can  be  transformed 
into  a  standard  protocol  that  is  optimal  in  all  runs.  No  previous  protocol 
for  this  problem  suggested  in  the  literature  is  optimal  in  all  runs.  Further¬ 
more,  in  many  cases  this  protocol  “fires”  much  earlier  than  any  other  known 
protocol  for  this  problem:  In  some  cases,  this  protocol  “fires”  as  soon  as  one 
round  after  the  first  “start”  signal  is  received. 

We  show  that  optimal  protocols  for  simultaneous  choice  problems  can  al¬ 
ways  be  implemented  in  a  communication  efficient  way,  in  all  variants  of  the 
omissions  model  we  consider.  However,  our  direct  implementation  of  tests 
for  common  knowledge  is  not  computationally  efficient:  It  requires  proces¬ 
sors  to  perform  exponential-time  computations  between  consecutive  rounds 
of  communication.  One  of  the  major  technical  contributions  of  this  chap¬ 
ter  is  a  method  of  efficiently  implementing  tests  for  common  knowledge  in 
several  variants  of  the  omissions  failure  model.  In  the  standard  omissions 
model,  a  failure  model  in  which  processors  fail  only  by  intermittently  failing 
to  send  some  of  the  messages  they  are  required  by  their  protocol  to  send, 
we  provide  a  clean  and  concise  method  of  efficiently  implementing  tests  for 
common  knowledge.  The  analysis  imderlying  this  method  reveals  the  basic 
combinatorial  structure  underlying  the  omissions  model,  as  well  as  crisply 
characterizing  the  set  of  facts  that  can  be  common  knowledge  at  any  point 
in  the  execution  of  a  protocol.  In  the  receiving  omissions  model,  a  failure 
model  in  which  processors  fail  only  by  intermittently  failing  to  receive  some 
of  the  messages  sent  to  them  rather  than  failing  to  send  messages,  testing 
for  common  knowledge  is  shown  to  be  trivial.  This  exposes  a  signihcant 
difference  between  two  seemingly  symmetric  failure  models. 
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We  are  not  able  to  efHcieiLtly  implement  tests  for  common  knowledge  in 
the  generalized  omissions  model,  in  which  fatdty  processors  may  fail  both 
to  send  and  to  receive  messages.  In  fact,  we  show  that  testing  for  common 
knowledge  in  this  model  is  NP-hard.  As  a  result,  using  the  close  relationship 
between  common  knowledge  and  simultaneous  actions,  we  are  able  to  show 
that  no  optimal  protocol  for  any  reasonable  simultaneous  choice  problem  can 
be  computationally  efficient  unless  P=NP.  In  particular,  in  this  model  there 
can  be  no  computationally-efficient  optimal  protocol  for  the  distributed  firing 
squad  problem  stated  above,  for  simultaneously  performing  Byzantine  agree¬ 
ment  (see  [PSL80,  DM86]),  or  for  most  any  other  simultaneous  problem.  We 
consider  another  variant  of  the  omissions  model,  czdled  generalized  omissions 
with  information,  in  which  it  is  assumed  that  the  intended  receiver  of  an  un¬ 
delivered  message  can  test  (and  therefore  knows)  whether  it  or  the  sender  is 
at  fault.  We  show  that  the  techniques  used  in  the  standard  omissions  model 
extend  to  this  model  as  well,  yielding  computationally-efficient  optimal  pro¬ 
tocols.  As  a  result,  we  see  that  optimal  protocols  for  simultaneous  choice 
problems  are  computationally  intractable  in  the  generalized  omissions  model 
precisely  because  of  the  fact  that  in  this  model  undelivered  messages  do  not 
uniquely  determine  the  set  of  faulty  processors. 

Thus,  we  show  how  to  derive  efficient  optimal  protocols  in  the  omissions 
model,  and  we  show  that  optimal  protocols  are  intractable  in  the  generalized 
omissions  model.  Since  it  is  unrealistic  to  expect  conventional  processors 
(limited  to  polynomial-time  computation)  to  follow  such  intractable  proto¬ 
cols,  it  becomes  becomes  interesting  to  ask  how  well  resource-bounded  pro¬ 
cessors  can  perform  simultaneous  actions  in  the  generalized  omissions  model. 
Analyzing  this  problem  requires  extending  the  theory  of  knowledge  given  in 
Chapter  2  to  account  for  the  restricted  computational  power  of  such  pro¬ 
cessors.  Such  an  extension  should  give  rise  to  notions  of  resource-bounded 
knowledge  and  common  knowledge  that  closely  correspond  to  the  ability  of 
resource-bounded  processors  to  perform  simultaneous  actioiis.  The  need  for 
a  theory  of  resource-bounded  knowledge  has  already  been  demonstrated,  pri¬ 
marily  by  cryptographic  problems  (e.g.,  [GM84,  GMR89]),  in  which  compu¬ 
tational  complexity  is  introduced  artificially  by  restricting  the  computational 
power  of  the  adversary,  thus  allowing  solutions  involving  encryption.  This 
work,  however,  provides  a  more  compelling  indication  of  the  need  for  such  a 
theory,  even  for  the  analysis  of  simple  problems  in  distributed  computation 
that  do  not  make  such  assumptions  about  the  adversary.  We  note  that  some 
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such  notions  of  knowledge  have  since  been  proposed  [Mos88,  HMT88,  FZ88], 
and  we  will  return  to  the  need  for  such  notions  in  Chapter  5  when  we  study 
cryptographic  protocols  in  terms  of  knowledge. 

Since  some  of  the  proofs  in  this  chapter  are  quite  technical,  their  details 
can  meike  it  difficult  to  obtain  a  high-level  understanding  of  this  work.  We 
strongly  recommend  that  the  reader  skip  all  proofs  on  the  first  reading.  The 
rest  of  this  chapter  is  organized  as  follows:  Section  3.2  defines  the  model  of 
distributed  systems  used  in  the  chapter.  In  Section  3.3  we  define  the  no¬ 
tion  of  a  simultaneous  choice  problem,  a  large  class  of  problems  involving 
coordinated  simultaneous  actions.  Section  3.4  presents  a  uniform  method  of 
deriving  an  optim2d  high-level  protocol  &om  the  specification  of  a  simtdtane- 
ous  choice  problem,  using  explicit  tests  for  common  knowledge.  Section  3.5 
deals  with  the  problem  of  efficiently  implementing  tests  for  common  knowl¬ 
edge  of  facts  relevant  to  simultaneous  choice  problems  in  a  number  of  failure 
models.  This  section  is  the  heart  of  the  chapter.  The  analysis  in  this  section 
reveals  interesting  properties  of  the  different  failure  models,  and  exposes  fine 
distinctions  between  them.  Finally,  Section  3.6  contains  some  concluding 
remarks. 


3.2  Model  of  a  System 

This  section  introduces  a  model  of  the  distributed  systems  with  which  this 
chapter  is  concerned,  an  elaboration  of  the  model  given  in  Chapter  2.  Our 
treatment  extends  euid  is  closely  related  to  that  of  [DM86]. 

We  consider  synchronous  systems  of  unreliable  processors.  Such  a  system 
consists  of  a  finite  collection  P  =  {pi,,,,  ,p„}  of  n  >  2  processors,  each  pair 
of  which  is  connected  by  a  two-way  communication  link,  emd  each  sharing 
a  common  global  clock  that  starts  at  time  0  and  advances  in  increments 
of  one.^  We  model  such  systems  by  elaborating  the  model  of  computation 
given  in  Chapter  2  in  the  following  ways.  In  addition  to  receiving  messages 
from  other  processors  at  the  end  of  a  round,  a  processor  may  also  receive 
requests  for  service  from  clients  external  to  the  system  (think,  for  example, 
of  a  distributed  airline  reservation  system).  These  external  requests  from  the 

^We  assume  the  existence  of  a  shared  global  clock  for  ease  of  exposition.  The  analysis 
performed  in  this  chapter  applies  even  if  the  processors  have  their  own  local  clocks,  possibly 
displaying  different  times,  as  long  as  the  clocks  tick  (or  advance)  at  the  same  rate. 
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clients  are  considered  distinct  &om  the  internal  messages  sent  by  processors 
in  the  system.  Actions  resulting  from  the  servicing  of  such  requests  may 
take  a  variety  of  forms,  including  the  initiation  of  various  activities  within 
the  system  by  sending  certain  messages  to  other  processors  in  later  rounds. 
Each  message  sent  by  a  processor  is  assumed  to  include  the  identities  of  the 
sender  and  intended  receiver  of  the  message,  as  well  as  the  round  in  which  it 
is  sent;  similarly  for  each  request.  At  any  given  time,  a  processor’s  message 
history  is  a  set  containing  the  messages  it  has  received  so  far  from  the  other 
processors,  and  a  processor’s  input  history  is  a  set  containing  its  initial  state 
together  with  the  requests  it  hats  received  so  far  from  the  system’s  external 
clients.  A  processor’s  local  state  at  any  given  time  consists  of  its  message 
history,  its  input  history,  the  time  on  the  global  clock,  and  the  processor’s 
identity.  For  techniced  reasons,  it  will  be  convenient  to  talk  about  processors’ 
states  at  negative  times  (before  time  0).  A  processor’s  state  at  a  negative 
time  is  defined  to  be  a  distinguished  empty  state. 

We  assume  processors  are  following  a  deterministic  protocol  as  defined  in 
Chapter  2.  Notice,  however,  that  the  state  protocol  component  of  a  proces¬ 
sor’s  local  protocol  is  no  longer  of  interest  since  we  have  already  described 
how  a  processor’s  local  state  should  change  from  round  to  round,  and  we 
will  ignore  it  for  the  remainder  of  this  chapter.  Consequently,  an  equivalent 
definition  of  a  protocol  is  a  function  from  processor’s  local  state  to  a  list  of 
actions  the  processor  is  required  to  perform,  followed  by  a  list  of  messages 
the  processor  is  required  to  send.  While  we  assume  that  all  processors  in  the 
system  faithfully  follow  their  protocols,  sending  emd  receiving  messages  as 
required,  some  messages  may  be  lost  due  to  failures  in  the  system.  A  run  of 
a  protocol  in  the  absence  of  any  such  failure  is  defined  precisely  as  defined 
in  Chapter  2.  In  the  presence  of  failures,  however,  we  must  elaborate  this 
definition:  given  a  run  in  which  failures  occur,  a  processor’s  message  history 
at  time  k  no  longer  records  all  messages  sent  to  it  during  round  h  since  some 
of  these  messages  may  be  lost.  (Of  course,  the  processor’s  message  history 
at  time  k  will  record  all  messages  recorded  in  its  message  history  at  time 
As  —  1.)  We  attribute  lost  messages  to  failures  on  the  part  of  processors  (due 
to  the  failures  of  their  input  or  output  ports,  say),  and  the  various  failure 
models  we  consider  differ  only  in  how  we  assign  these  failures  to  processors. 


^  AW 


•  the  omissions  model  ([MSF83J),  in  which  a  lost  message  indicates  that 
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the  sender  of  the  message  is  faulty; 

•  the  receiving  omissions  model,  in  which  a  lost  message  indicates  that 
the  receiver  is  faulty; 

•  the  generalized  omissions  model  ([PT86]),  in  which  a  lost  message  in¬ 
dicates  that  either  the  sender  or  receiver  is  faulty;  and 

«  generalized  omissions  with  information,  which  differs  from  the  general¬ 
ized  omissions  model  in  that  the  intended  receiver  of  a  lost  message  is 
told  whether  the  sender  or  the  receiver  is  faulty. 

When  the  sender  of  a  lost  message  is  said  to  be  at  fault,  we  say  the  processor 
failed  to  send  the  message;  and  when  the  receiver  of  a  lost  message  is  said  to 
be  at  fault,  we  say  the  processor  failed  to  receive  the  message. 

We  now  define  the  notion  of  a  failure  pattern,  a  formal  description  of 
faulty  processor  behavior  during  a  run.  The  notion  of  a  failure  pattern  in 
each  variant  of  the  omissions  model  is  a  suitable  restriction  of  the  general 
definition  given  here.  Remember  that  a  faulty  processor  may  fail  to  send  or 
receive  certain  messages.  It  is  therefore  natural  to  define  the  faulty  behavior 
of  a  processor  p  to  be  a  pair  of  functions  S  and  R  mapping  round  numbers  to 
sets  of  processors.  Intuitively,  these  are  the  processors  p  fails  to  send  messages 
to  or  receive  messages  from,  respectively,  during  each  round.  A  failure  pattern 
is  a  collection  of  faulty  behaviors  {Si,Ri),  one  for  each  processor  p^.  The 
processor  pi  is  said  to  be  faulty  in  such  a  failure  pattern  if  either  of  the 
sets  Si{k)  or  Ri{k)  is  nonempty  for  some  k,  in  which  case  pi  is  said  to  fail 
during  round  k,  and  pi  is  said  to  be  nonfaulty  otherwise.  K,  for  example,  the 
set  Si{k)  contains  the  processor  pj,  we  say  that  pi  is  faulty  since  any  message 
Pi’s  protocol  requires  that  it  send  to  pj  will  be  lost.  Notice,  however,  that 
a  faulty  processor  need  not  actually  exhibit  any  faulty  behavior  at  all  since 
the  fact  that  any  message  from  p,-  to  pj  during  round  k  is  lost  will  never  be 
discovered  if  Pi’s  protocol  does  not  require  it  to  send  any  message  to  Pj  in 
round  k. 

The  failure  pattern  of  a  run  is  a  failure  pattern  with  the  property  that 
in  every  round  k  each  processor  Pi  sends  no  messages  to  processors  in  Si{k) 
but  sends  all  required  messages  to  processors  not  in  Si{k),  and  receives  no 
messages  hrom  processors  in  Ri{k)  but  receives  all  messages  sent  to  it  by 
processors  not  in  Ri{k).  Notice,  by  the  way,  that  a  run  may  be  consistent 
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with  more  than  one  failure  pattern  if  the  protocol  being  followed  does  not 
require  processors  to  send  messages  to  every  processor  in  every  round.  Given 
a  run  r,  if  7i  is  the  complete  input  history  of  processor  pi  in  r,  then  we  say 
that  7  =  (7» , . . . ,  7n)  is  the  input  to  r. 

A  pair  (7r,7),  where  tt  is  a  failure  pattern  zmd  7  is  an  input,  is  called  an 
operating  environment.  Notice  that  an  operating  environment  is  independent 
of  einy  particular  protocol.  An  operating  environment  simply  determines  for 
each  processor  and  for  each  round  what  faulty  behavior  it  will  exhibit  (if 
any)  during  the  round  and  what  external  requests  it  will  receive  during  the 
round,  regardless  of  the  protocol  the  processor  is  following.  Given  an  op¬ 
erating  environment  together  with  a  particular  protocol,  however,  the  two 
uniquely  determine  a  run  of  the  given  protocol  (in  the  given  operating  envi¬ 
ronment).  Two  runs  of  two  different  protocols  are  said  to  be  corresponding 
runs  if  they  have  the  same  operating  environment.  The  fact  that  an  oper¬ 
ating  environment  is  independent  of  the  protocol  will  allow  us  to  compare 
different  protocols  according  to  their  behavior  in  corresponding  runs. 

In  many  systems  of  interest,  the  environment  reacts  to  the  protocol  being 
followed  by  the  system,  meaning  that  the  input  the  system  received  from  the 
environment  can  depend  on  the  output  to  the  environment  generated  by  the 
system.  One  can  imagine,  for  example,  a  bank  customer  walking  up  to  a 
teller  to  withdraw  $100.  If  the  teller’s  “protocol”  causes  the  teller  to  hand 
the  customer  100  one  dollar  bills,  the  customer  will  probably  ask  for  two  $50 
bills  instead.  If  the  teller’s  “protocol”  causes  the  teller  to  hand  the  customer 
a  single  $100  bill,  the  customer  may  not  ask  for  two  $50  bills.  Because  the 
environment  reacts  differently  to  the  two  teller  protocols,  making  different 
requests  in  the  context  of  the  different  protocols,  it  seems  difficult  to  compare 
the  two  protocols  in  the  context  of  a  fixed  sequence  of  requests  by  the  bank 
customer.  In  contrast,  however,  we  are  interested  in  protocols  that  react  to 
their  environment,  and  not  the  environment’s  reaction  to  the  protocol.  Our 
method  of  comparing  protocols  does  not  allow  us  to  study  the  interaction  of 
protocols  and  their  environment  from  both  points  of  view. 

In  this  work,  we  study  the  behavior  of  protocols  in  the  presence  of  a 
bounded  number  of  failures  (of  a  particular  type)  and  a  given  setting  of 
possible  inputs.  It  is  therefore  natural  to  identify  a  system  with  the  set  of 
all  possible  runs  of  a  given  protocol  under  such  circumstances.  Formally,  a 
system  is  identified  with  the  set  of  runs  of  a  protocol  V  with  n>2  proces¬ 
sors  of  which  at  most  t  <n  — 2  may  be  faulty  (in  the  sense  of  a  particular 
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failure  model  M  defined  above),  where  the  complete  input  history  of  each 
processor  pi  is  an  element  of  a  set  Ff.  We  denote  this  set  of  runs  by  the 
tuple  S  =  {n,t,T ,  MfTu . . .  ^Tn).  Our  definition  of  a  system  ensures  that 
the  input  to  the  system  is  orthogonal  to,  and  hence  carries  no  information 
about,  the  failure  pattern.  In  addition,  since  the  set  of  possible  inputs  in  the 
system  has  the  form  FiX.-.xFn,  one  processor’s  input  contains  no  infor¬ 
mation  about  any  other  processor’s  input,  and  hence  the  only  way  in  which 
processors  obtain  information  about  other  processors’  input  is  via  messages 
communicated  between  the  processors  in  the  system. 

While  a  protocol  may  be  thought  of  as  a  function  of  processors’  states, 
protocols  for  distributed  systems  (as  well  as  protocols  for  sequential  and 
parallel  computation)  Me  typically  written  uniformly  in  terms  of  the  number 
n  of  processors  and  the  number  t  of  failures  tolerated,  for  values  of  n  and 
t  of  virtually  arbitrary  size  (although  requirements  such  as  n  >  2i  must 
sometimes  be  satisfied  in  order  for  the  protocol  to  behave  correctly).  In 
this  sense,  the  protocol  is  parameterized  by  n  and  t,  and  the  actions  and 
messages  required  of  a  processor  by  a  protocol  may  be  viewed  as  depending 
on  n  and  t  as  well  as  the  processor’s  state.  Therefore,  for  the  purposes 
of  this  chapter,  we  assume  that  a  protocol  is  a  function  &om  n,  t,  and  a 
processor’s  local  state  to  a  list  of  actions  the  processor  is  required  to  perform, 
followed  by  a  list  of  messages  the  processor  is  required  to  send.^  Since  each 
protocol  is  defined  for  systems  of  arbitrary  size,  it  is  natural  to  define  a 
class  of  systems  to  be  a  collection  of  systems  {S(n,  t) :  n  >  t  -f  2  >  0},  where 
S(n,  t)  =  r„)  for  some  fixed  protocol  V,  failure  model  M , 

and  input  sets  F^. 


^Notice  that  processors  must  compute  this  function  by  following  some  algorithm.  Thus, 
while  we  formally  define  a  protocol  in  terms  of  functions,  it  is  convenient  to  maintain  both 
views  of  a  protocol  as  a  function  and  an  algorithm. 
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3.3  Simultaneous  Choice  Problems 

In  this  section  we  define  the  class  of  simultaneous  choice  problems  for  which 
we  construct  optimal  protocols,  a  large  class  of  problems  that  capture  the 
essence  of  coordinated  simultaneous  action  in  a  distributed  environment. 
Roughly  speaking,  these  problems  require  that  one  of  a  number  of  alter¬ 
native  actions  be  performed  (or  “chosen”)  simultaneously  by  the  nonfaulty 
processors,  where  for  each  action  we  axe  given  conditions  under  which  the 
action  must  be  performed  and  conditions  under  which  its  performiince  is  for¬ 
bidden.  In  addition  to  these  conditions,  the  specification  of  such  a  problem 
must  also  determine  the  possible  operating  environments  in  which  such  a 
choice  is  to  be  made,  by  specifying  what  inputs  each  processor  may  possibly 
receive  and  what  types  of  processor  failures  are  possible. 

We  think  of  an  action  as  something  special  that  can  be  done  by  a  proces¬ 
sor.  An  action  might  be  writing  the  value  1  to  an  output  register,  or  entering 
some  distinguished  state  such  as  the  “firing”  state  in  the  distributed  firing 
squad  problem.  Formally,  an  action  can  be  modeled  as  a  message  a  processor 
can  send  to  the  environment.  There  is  nothing  about  the  action  itself  that 
restricts  its  performance,  say,  to  time  k  but  not  to  time  A  simultaneous 
action  a  is  an  action  with  two  associated  conditions  pro{a)  and  con(a)  stat¬ 
ing  when  the  action  a  should  or  should  not  be  performed.  Recall  that  a  run 
is  determined  by  a  protocol  and  em  operating  environment;  it  follows  that 
the  operating  environment  is  the  most  general  protocol-independent  aspect 
of  a  run  a  problem  specification  cem  refer  to  when  stating  when  an  action 
should  or  should  not  be  performed.  Consequently,  we  assume  both  pro{a) 
and  cor^^a)  are  facts  about  the  operating  environment. 

A  simultaneous  choice  problem  (or  simply  a  simultaneous  choice)  C  is  de¬ 
termined  by  a  set  {uj, . . .  ,0^}  of  simultaneous  actions  and  their  associated 
conditions,  together  with  a  failure  model  Ad,  and  a  set  F,-  of  complete  in¬ 
put  histories  for  each  processor  pj.  Intuitively,  we  want  all  of  the  nonfaulty 
processors  to  choose  one  of  the  actions  o,-  that  they  can  perform  without 
violating  the  pro{aj)  and  con{aj)  conditions,  and  to  perform  a,-  simultane¬ 
ously.  Since  the  pro(aj)  and  con{aj)  conditions  are  facts  about  the  operating 
environment,  which  means  they  depend  on  the  input  and  failure  patterns, 
we  include  in  the  problem  specification  the  sets  F,-  determining  the  possi¬ 
ble  input  patterns  and  the  failure  model  M  determining  the  possible  failure 
patterns.  (Ad  will  always  be  one  of  the  failure  models  defined  in  Section  3.2.) 
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Loosely  speaking,  we  want  every  run  r  of  a  protocol  implementing  C 
satisfy  the  following  conditions: 

(i)  each  nonfaulty  processor  performs  at  most  one  of  the  a,’s, 

(ii)  any  a,  performed  by  some  nonfaulty®  processor  is  performed  simulta¬ 
neously  by  all  of  them, 

(iii)  Of  is  performed  by  all  nonfaulty  processors  if  r  satisfies  pro(ai),  and 

(iv)  is  not  performed  by  any  nonfaulty  processor  if  r  satisfies  con^ai). 

More  formally,  a  protocol  V  and  the  simultaneous  choice  C  determine  a  class 
of  systems  {S(n,t) :  n  >  t  +  2},  where  S(n,  i)  =  (n,t,7^,  A4,ri, . . .  ,r„).  We 
say  that  V  implements  C  if  every  run  of  every  system  in  the  class  determined 
by  V  and  C  satisfies  the  conditions  (i)-(iv)  above.  A  simultaneous  choice  is 
said  to  be  implementable  (or  satiafiable)  if  there  is  a  protocol  that  implements 
it.  We  note  that  both  V  and  C  are  required  to  completely  determine  a  system 
(a  set  of  runs):  because  a  run  is  determined  by  a  protocol  and  an  operating 
environment,  the  protocol  V  is  clearly  required,  and  the  failure  model  M  and 
input  sets  contributed  by  C-  are  required  to  determine  the  set  of  possible 
operating  environments. 

This  definition  of  a  simultaneous  choice  is  fairly  abstract.  However,  many 
familiar  problems  requiring  simultaneous  action  by  a  group  of  processors  are 
instances  of  a  simultaneous  choice.  In  all  known  cases,  the  conditions  pro{ai) 
and  con{ai)  are  facts  about  the  input  and  the  existence  of  failures,  and  hence 
are  facts  about  the  operating  environment.  (By  the  existence  of  failures  we 
mean  whether  any  failure  whatsoever  occurs  diiring  the  run.  Some  problems 
allow  the  nonfaulty  processors  to  display  default  behavior  in  the  presence  of 
failures;  see  [LF82].)  For  example,  the  distributed  firing  squad  problem  is  a 
simultaneous  choice  consisting  of  a  single  “firing”  action  a,  with  the  condition 
pro{a)  being  the  receipt  of  a  “start”  signal  by  a  nonfaulty  processor,  and  the 
condition  cor^a)  being  that  no  processor  receives  a  “start”  signal.  Each  set  Tj 

^We  have  chosen  the  set  N  of  nonfaulty  piocessots  as  the  set  of  processors  required  to 
perform  actions  simultaneously,  but  the  notion  of  a  simultaneous  choice  problem  may  be 
stated  in  terms  of  many  other  similar  (indexical)  sets  of  processors,  including  the  set  P  of 
all  processors,  with  the  analysis  in  this  section  and  the  next  one  carrying  through  without 
change. 
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of  possible  inputs  simply  allows  for  a  “start”  message  to  be  delivered  to  any 
processor  at  any  time. 

In  addition  to  simultaneous  choice  problems,  we  also  consider  the  closely 
related  class  of  strict  simultaneous  choice  problems.  Both  classes  are  specified 
in  essentially  the  ssmie  way,  except  that  runs  of  a  protocol  implementing  a 
strict  simultaneous  choice  are  required  to  satisfy  the  modified  condition 

(i')  each  nonfaulty  processor  performs  exactly  one  of  the  Of’s, 

together  with  conditions  (ii)-(iv)  above.  All  of  the  results  in  this  chapter 
hold  for  a  strict  simultaneous  choice  as  well  as  a  simultaneous  choice,  and 
henceforth  we  will  mention  explicitly  only  to  a  simultsmeous  choice. 

The  simultaneous  Byzantine  agreement  problem  (see  [DM86,  PSL80])  is 
an  example  of  a  strict  simultaneous  choice.  This  problem  consists  of  an 
action  oq  of  “deciding  0”  and  an  action  ai  of  “deciding  1.”  Each  set  Tj 
of  possible  inputs  consists  of  two  possible  inputs:  one  starting  mth  initial 
value  0  and  receiving  no  further  external  input  during  the  run,  and  the  other 
starting  with  initial  value  1.  The  condition  pro(oo)  is  that  all  initial  values 
are  0,  and  the  condition  pro(ai)  is  that  all  initial  values  are  1.  The  conditions 
con(oo)  and  co»j(ai)  are  both  taken  to  be  false.  Simultaneous  Byzantine 
agreement  is  a  strict  simtiltaneous  choice,  since  the  processors  are  required 
to  decide  either  0  or  1  in  every  run.  Other  related  problems  that  may  also  be 
formulated  as  (strict)  simultaneous  choice  problems  include  weak  Byzantine 
agreement  and  the  Byzantine  Generals  problem  (see  [Fis83]). 

Having  formally  defined  a  simultaneous  choice  (and  a  strict  simultaneous 
choice),  let  us  consider  when  the  specification  of  such  a  problem  disallows 
performing  a  simultaneous  action  a,-.  Clearly,  if  con{ai)  holds  then  perform¬ 
ing  ai  is  disallowed.  In  addition,  since  by  condition  (i)  no  more  than  one 
action  may  be  performed  by  the  nonfaulty  processors  in  any  given  run,  the 
condition  pro{aj),  for  some  j  ^  t,  requires  o,-  to  be  performed,  and  hence 
also  disallows  a,-.  It  is  easy  to  see  that  these  are  the  only  conditions  under 
which  performing  a,-  is  disallowed.  This  motivates  the  following  defimtion: 

enabled{ai)  ^  -icor^^ai)  A  ~'pro{aj). 

Our  discussion  above  implies  that  the  performance  of  em  action  a;  is  allowed 
by  the  problem  specification  iff  the  condition  enabled^ai)  is  satisfied.  No¬ 
tice  that  it  is  possible  for  severed  of  the  conditions  enabled{ai)  to  hold  at 
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once,  in  which  case  performance  of  any  of  the  enabled  actions  is  allowed  by 
the  problem  specification.  In  addition,  it  is  easy  to  see  that  the  formulas 
con(ai)  D  ~>enabled(ai)  and  pro(ai)  D  -lenabled^aj)  (j  ^  t)  are  valid  in  any 
system  in  which  processors  follow  a  protocol  implementing  a  simultaneous 
choice.  Finally,  notice  that  because  the  conditions  pro(aj)  and  con(aj)  are 
facts  about  the  operating  environment,  so  is  each  condition  enabled{ai). 

As  an  example,  notice  that  the  condition  enabled{a)  for  the  distributed  fir¬ 
ing  squad  problem  is  simply  that  some  processor  receives  a  “start”  signal.  For 
the  simultaneous  Byzantine  agreement  problem,  the  condition  enabled{ao)  is 
that  some  initial  value  is  0,  and  the  condition  enabled^ai)  is  that  some  initial 
value  is  1.  Since  for  most  assignments  of  initial  values  both  enablet^ao)  and 
enabled(ax)  hold,  it  is  typically  the  case  that  deciding  either  0  or  1  is  accept¬ 
able.  It  need  not  be  the  case  (and,  in  fact,  usually  will  not  be  the  case)  that 
the  conditions  enabled{ai)  for  a  typical  simultaneous  choice  will  be  mutually 
exclusive. 

Having  formally  defined  the  notion  of  a  simultaneous  action,  we  are  now 
in  a  position  to  carefully  state  the  relationship  between  simultaneous  actions 
and  common  knowledge  mentioned  in  the  introduction:  When  a  simultane¬ 
ous  action  is  performed,  it  is  common  knowledge  that  the  action  is  being 
performed.  The  statement  we  actually  prove  is  that  when  such  an  action  is 
performed,  it  is  common  knowledge  that  the  action  is  enabled.  This  is  the 
first  (and  the  key)  relationship  we  establish  between  common  knowledge  and 
the  performemce  of  simultaneous  actions. 

Lemma  3.1:  Let  r  be  a  run  of  a  protocol  implementing  a  simultaneous 
choice  C.  If  the  action  oj  of  C  is  performed  by  a  nonfaulty  processor  at  time  I 
in  r,  then  {r,l)  \=  C^enabled{ai). 

Proof;  Let  tp  be  the  fact  “uf  is  being  performed  by  a  nonfaulty  processor.” 
A  processor  pj  performing  the  action  a;  cleturly  knows  that  it  is  perform¬ 
ing  Uj.  This  processor  therefore  also  knows  that  if  it  is  nonfaulty,  then  Oj 
is  being  performed  by  a  nonfaulty  processor.  Since  r  is  a  run  of  a  protocol 
implementing  C,  the  action  a,-  is  performed  simultaneously  by  all  nonfaulty 
processors  whenever  it  is  performed  by  a  single  nonfaulty  processor.  It  fol¬ 
lows  that  whenever  ip  holds,  so  does  and  hence  p  D  E^p  is  valid 

in  the  system.  The  induction  riile  implies  that  p  D  C^/p  is  valid  in  the 
system  as  well.  Notice  that  p  D  enabled(ai)  is  valid  in  the  system.  It 
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follows  that  Cj/<p  D  Cj/enabled{ai)  is  valid  in  the  system,  and  hence  so  is 
(f  D  Gtfenabled{ai).  Thus,  (r,^)  |=  tp  implies  (r,/)  |=  Cj/6nabled{ai),  and  we 
are  done.  □ 

In  the  above  proof,  the  essential  fact  that  tp  D  E^tp  is  valid  in  the  system 
depends  crucially  on  our  definition  of  Ej^ttp.  As  discussed  in  Chapter  2,  a 
processor  p  performing  a;  knows  that  a,-  is  being  performed,  but  since  a 
nonfaulty  processor  might  not  know  that  it  is  nonfaulty,  p  might  not  know 
that  ai  is  being  performed  by  a  nonfaulty  processor.  The  processor  p  does 
know,  however,  that  if  it  {p  itself)  is  nonfaulty,  then  a  nonfaulty  processor 
is  performing  ai.  It  is  for  this  reason  that  we  have  been  led  to  choose  our 
definition  of  Ef/<p  as  we  have,  as  discussed  in  Chapter  2. 


3.4  Optimal  Protocols 

In  this  section,  we  show  how  to  extract  a  high-level  optimal  protocol  for  a 
simultaneous  choice  problem  directly  from  its  specification.  (As  mentioned 
in  the  introduction,  we  use  the  word  optimal  as  shorthand  for  optimal  in  all 
runs;  recall  that  this  optimality  is  in  terms  of  the  number  of  roimds  required 
to  perform  a  simultaneous  choice.)  We  begin  by  considering  a  simple  class 
of  protocols  that  will  serve  as  a  building  block  in  the  design  of  such  optimal 
protocols.  Recall  that  we  think  of  a  protocol  as  having  two  components, 
an  action  protocol  and  a  message  protocol.  A  protocol  is  said  to  be  a  full- 
information  protocol  (cf.  [Had83,  FL82,  PSL80])  if  its  message  protocol  is: 

repeat  every  round 

send  current  local  state  to  all  processors 
forever. 


Intuitively,  since  such  a  protocol  requires  that  all  processors  send  all  of  the 
information  available  to  them  in  every  round,  one  would  expect  this  protocol 
to  give  each  processor  as  much  information  about  the  operating  environ¬ 
ment  us  any  protocol  could.  In  particular,  the  following  result  shows  that 
if  a  processor  cannot  distinguish  two  operating  environments  during  runs 

rk'f  n  •fiill-iTt'frvT'Trio 
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operating  environments  during  runs  of  any  other  protocol. 
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Lemma  3.2:  Let  r  and  r'  be  runs  of  a  full-information  protocol  T,  and 
let  s  and  s'  be  runs  of  an  arbitrary  protocol  V  corresponding  to  r  and  r', 
respectively.  For  all  processors  q  and  times  I,  if  r,(/)  =  r'(^)  then  «,(/)  = 

Proof:  We  proceed  by  induction  on  the  time  1.  The  case  of  f  =  0  is 
immediate  since  q  must  have  the  same  initial  state  in  both  r  and  r',  and 
hence  also  in  s  and  s'.  Suppose  /  >  0  and  the  inductive  hypothesis  holds  for 
all  processors  p  at  time  t—1.  The  local  state  of  q  at  time  I  is  determined  by 
its  locd  state  at  time  f  —  1,  the  (external)  input  it  receives  during  round  I, 
and  the  messages  it  receives  during  round  1.  Since  q  has  the  same  local  state 
at  time  ^  —  1  in  r  and  r',  by  the  inductive  hypothesis,  the  same  is  true  in  s 
and  s'.  Since  q  receives  the  same  input  during  round  Imr  and  r',  the  same 
is  true  in  s  and  s'.  If  q  does  not  receive  a  message  from  p  during  round  I 
in  r  and  r',  then  both  operating  environments  determine  that  no  message 
from  p  to  q  during  round  I  is  delivered.  Thus,  q  does  not  receive  a  message 
from  p  during  round  i  in  either  s  or  s'.  If  q  does  receive  a  message  from  p 
during  round  i  in  r  and  r',  then  both  operating  environments  determine  that 
any  message  from  p  to  9  during  round  t  is  delivered.  If  q  receives  a  message 
from  p  during  round  tofr  and  r',  then  since  q  must  receive  the  same  message 
from  p  in  both  r  and  r',  the  local  state  of  p  must  be  the  same  at  time  f  —  1  in  r 
and  r'.  By  the  inductive  hypothesis,  p’s  local  state  at  time  1—1  must  also  be 
the  seune  in  s  and  s'.  Since  V  ia  n  deterministic  function  of  processor  states, 
q  receives  the  same  messages  from  p  during  round  f  in  a  and  s'.  Thus,  q  heis 
the  same  local  state  at  time  f  in  a  smd  a'.  □ 

Thus,  roughly  speaking,  processors  learn  the  most  about  the  operating 
environment  during  runs  of  full-information  protocols.  The  following  corol¬ 
lary  of  Lemma  3.2  shows  that  facts  about  the  operating  environment  become 
common  knowledge  during  runs  of  such  protocols  at  least  as  soon  as  they  do 
during  runs  of  any  other  protocol.  This  result  captures  in  a  precise  sense  a 
property  of  full-information  protocols  that  is  essential  to  our  analysis. 

Corollary  3.3:  Let  ^  be  a  fact  about  the  operating  environment.  Let  r 
and  s  be  corresponding  runs  of  a  full-information  protocol  T  and  an  arbitrary 
protocol  7^,  respectively.  If  (s,/)  |=  Cn^  then  (r,/)  [=  Ctf<p. 

Proof:  Suppose  that  (s,f)  |=  Gn^p.  We  will  prove  that  (r,/)  |=  by 
showing  that  (r',^)  }=  y?  for  all  runs  r'  of  T  such  that  (r,/)  ~  (r',f);  that  is. 
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that  (r,  £)  and  (r',  i)  are  in  the  same  connected  component  of  the  similarity 
graph.  Fix  r',  and  let  s'  be  the  run  of  V  corresponding  to  r'.  Lemma  3.2 
and  a  simple  inductive  argument  on  the  distance  between  (r,  1)  and  (r',  i)  in 
the  similarity  graph  show  that  (r,/)  ~  implies  (s,£)  ^  («',/)•  Since 
(s,^)  f=  C'vV’i  we  have  (s',/)  |=  Since  corresponding  runs  satisfy  the  same 
facts  about  the  operating  environment,  (s',/)  1=  ^  implies  (r',/)  (=  It 
follows  that  (r,  /)  [=  Cj/^.  □ 

We  are  now  in  a  position  to  describe  how  to  construct  optimal  proto¬ 
cols  for  simultane  ^us  choice  problems.  Recall  that  when  a  simultaneous 
action  a,-  is  performed,  Lemma  3.1  implies  that  enabled{ai)  must  be  com¬ 
mon  knowledge.  Since  enabled{ai)  is  a  fact  about  the  operating  environment. 
Corollary  3.3  implies  that  enabled{ai)  becomes  common  knowledge  in  runs 
of  a  full-information  protocol  as  soon  at  it  does  in  corresponding  runs  of 
any  other  protocol.  Thus,  given  an  effective  test  that  the  nonfaulty  proces¬ 
sors  can  use  to  determine  whether  enabled{ai)  is  common  knowledge,  a  test 
we  denote  by  te3t-for-Cf/enabled{a{),  the  following  protocol  J^c  is  an  optimal 
protocol  for  C: 

no.actionjperformed  *-  true; 
repeat  every  lonnd 

if  no-actionjperformed  and 

tesi-for-Cj^fenabled{ai)  returns  true  for  some  o,- 

then 

j  <— min{i  :  tesi-for-Ctfenabhd{ai)  returns  true}, 
perform  aj, 

no-actionjptrformed  <—  false; 
send  current  local  state  to  every  processor; 
forever. 

Before  formally  proving  that  J^c  is  an  optimal  protocol,  we  must  define  more 
formally  the  tests  for  common  knowledge  that  appear  in  Tc.  Recall  that  the 
fixpoint  axiom  implies  that  Cz/ip  D  Ej/Cf/<p  is  valid.  This  guarantees  that 
follows  from  the  local  state  of  each  nonfaulty  processor  whenever 
holds.  In  ether  words,  since  implies  Ej/Cj/^P  which,  for  every  nonfaulty 
processor  p,-,  implies  each  nonfaulty  processor  can  determined  from 

its  local  state  that  Cj^ip  holds.  This  is  not  true  for  faulty  processors. 

It  is  therefore  natural  to  define  a  test  for  common  knowledge  of(p,  denoted 
as  above  by  test-for-Cj^fp,  to  be  a  test  that,  given  the  local  state  of  a  nonfaulty 
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processor  at  {r,t)  (together  with  n  and  t),  returns  true  iff  Cj^<p  holds  at 
(r,l).  Such  a  test  may  return  either  true  or  false  when  given  the  local 
state  of  a  faulty  processor.  Let  us  denote  by  Aj(r,/)  the  set  of  actions  Of 
such  that  test-for-Guenabled(ai)  returns  true  when  given  the  loced  state  of  pj 
at  (r,^).  Notice  that  if  pj  is  nonfauity,  then  Aj{r,i)  is  precisely  the  set  of 
actions  a;  such  that  Cj/enabled{ai)  holds  at  (r,f).  It  follows  that  for  all 
nonfauity  processors  pj  the  sets  Aj  are  equal  at  all  times.  In  particular,  all 
become  nonempty  at  the  same  time  (as  soon  as  enablec^ai)  becomes  common 
knowledge  for  some  Oj).  Thus,  if  all  processors  pj  choose  the  action  of  least 
index  from  A.j  as  soon  as  this  set  becomes  nonempty,  as  required  by  .Fc, 
then  all  nonfauity  processors  choose  the  same  action  simultaneously.  We  cein 
now  prove  that  Tc  is  an  optimad  protocol  for  C.  (Recall  that  a  simultaneous 
choice  problem  is  implementable  iff  there  exists  a  protocol  that  implements 
it.) 

Theorem  3,4:  If  C  is  an  implementable  simultaneous  choice  problem,  then 
is  an  optimal  protocol  for  C. 


Proof:  We  first  prove  that  nonfatilty  processors  perform  actions  in  runs 
of  !Fc  as  soon  as  they  do  in  corresponding  runs  of  any  protocol  implement¬ 
ing  C.  Let  r  be  a  run  of  .Fc,  and  let  s  be  the  corresponding  run  of  a 
protocol  implementing  C.  Lemma  3.1  implies  that  if  Ui  is  performed  by 
a  nonfauity  processor  at  time  t  in  s,  then  (s,f)  |=  Cucnabled{ai).  Since 
enabled^ai)  is  a  fact  about  the  operating  environment.  Corollary  3.3  implies 
that  (r,f)  1=  C f/enabled{ai).  As  a  result,  Aj{r,l)  must  be  nonempty  for  all 
nonfauity  processors  pj,  and  hence  each  must  perform  an  action  in  r  no  later 
that  time  t  It  follows  that  nonfauity  processors  perform  actions  in  runs  of  !Fc 
as  soon  as  they  do  in  corresponding  runs  of  any  protocol  implementing  C. 

We  now  show  that  J^c  actually  implements  C.  Let  r  be  a  run  of  .Fc.  First, 
it  is  obvious  from  the  definition  of  J^c  that  each  nonfauity  processor  performs 
at  most  one  action  in  r.  (If  C  is  an  implementable  strict  simultaneous  choice, 
then  the  preceding  discussion  shows  that  the  nonfauity  processors  perform 
exactly  one  action  in  r.)  Second,  if  a  nonfauity  processor  pj  performs  an 
action  Cj  at  time  £  during  r,  then  time  £  is  the  first  time  at  which  Aj{r,  k) 


is  nonempty,  and  Uj  is  the  action  of  least  index  in  this  sat.  Since  Aj{v^k)  =■ 


AmiT^k)  for  all  nonfauity  processors  pm,  the  same  is  true  for  all  nonfauity 


processors.  As  a  result,  all  nonfauity  processors  must  choose  to  perform  a; 
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simultaneously  at  time  £.  Third,  if  r  satisfies  pro(a{),  then  the  run  s  of 
any  protocol  implementing  C  corresponding  tc  r  must  satisfy  pro(ai),  and 
hence  Oj  must  be  performed  in  5.  As  we  have  already  seen,  an  action  must 
also  be  performed  in  r.  Since  pro(ai)  D  ->enabled(aj)  for  all  j  t,  the  set 
Aj(r,  k)  of  a  nonfatilty  processor  pj  must  contain  no  action  other  than  o,-  (if 
it  contains  any  action  at  all).  Thus,  a,-  must  be  the  action  performed  in  r. 
Finally,  if  r  satisfies  con(a,-),  then  r  does  not  satisfy  enai/ed(a,-),  and  no  set 
Aj{r,  t)  for  any  nonfaulty  processor  pj  contains  a,-.  Thus,  a;  is  not  performed 
in  r.  It  follows  that  implements  C.  □ 

As  a  result  of  Theorem  3.4,  we  see  that  full-information  protocols  can 
be  used  as  the  basis  of  optimeil  protocols  for  simultaneous  choice  problems. 
Thus,  we  will  restrict  our  attention  to  full-information  protocols  in  the  re¬ 
mainder  of  this  chapter:  Uidess  otherwise  specified,  all  protocols  mentioned 
will  be  full-information  protocols,  and  all  runs  will  be  runs  of  such  protocols. 
More  important,  however,  a  consequence  of  Theorem  3.4  is  that  designing 
an  optimal  protocol  for  a  simultaneous  choice  problem  C  essentially  reduces 
to  testing  for  common  knowledge  of  certain  facts:  In  order  to  design  an 
optimal  protocol  for  C,  it  is  enough  to  construct  the  tests  for  common  knowl¬ 
edge  of  the  facts  tnahhd{ai).  We  note  that  the  fundamental  property  of 
common  knowledge  underlying  the  existence  of  such  tests  is  the  fact  that 
D  EtfCfffp  is  valid;  that  is,  when  <p  becomes  common  knowledge,  the 
fact  that  (f  is  common  knowledge  will  follow  from  the  local  state  of  every 
nonfaulty  processor.  The  problem  of  implementing  such  tests  is  the  subject 
of  the  following  section. 

Before  ending  this  section,  however,  we  consider  the  size  of  messages  re¬ 
quired  by  a  full-information  protocol  T.  Such  a  protocol  requires  processors 
to  send  their  entire  local  state  during  every  round.  Since,  strictly  speaking, 
the  size  of  a  processor’s  state  may  be  exponential  in  the  number  of  rounds 
elapsed,  this  protocol  seems  to  require  processors  to  send  messages  of  expo¬ 
nential  length.  We  now  show,  however,  that  in  the  variants  of  the  omissions 
model  we  consider  in  this  work  there  is  a  simple,  compact  representation  of 
a  processor’s  state  that  may  be  sent  instead.  Consequently,  it  will  be  possi¬ 
ble  to  implement  all  full-information  protocols  (and  in  particular  the  optimal 
protocol  jFc)  in  a  communication-efiicient  way  in  all  variants  of  the  omissions 
model.  We  note  that  this  representation  depends  heavily  on  the  fact  that 
the  only  faulty  behavior  a  faulty  processor  may  exhibit  involves  the  loss  of 
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a.  ff(r,3) 


b.  ai(r,3) 


Figure  3.1:  Communication  graphs. 


messages.  The  technique  does  not  work  in  the  Byzantine  models  where  pro¬ 
cessors  may  send  incorrect  messages  in  addition  to  losing  messages.  Results 
of  [Coa86,  Mic88,  Mic89]  show  how  the  size  of  messages  in  such  models  may 
be  reduced. 

Given  a  run  r  of  .F,  the  communication  graph  of  r  (see  Figure  3.1)  repre¬ 
sents  the  messages  delivered  in  r.  It  is  a  layered  graph  (with  one  layer  corre¬ 
sponding  to  every  natural  number,  representing  time  on  the  global  clock)  in 
which  each  processor  is  represented  by  one  node  in  every  layer.  We  denote 
the  node  representing  processor  p  at  time  I  by  (p,  1) .  Edges  connect  nodes 
in  adjacent  layers,  with  an  edge  between  (p,  k  —  \)  and  (q,  k)  iff  a  message 
&om  p  is  delivered  to  q  during  round  k.  The  labeled  communication  graph 
is  obtained  by  labeling  the  layer  0  nodes  of  the  communication  graph  with 
processors’  names  and  initial  states,  and  by  labeling  the  layer  k  nodes  (for 
k  >  0)  with  the  requests  the  processors  receive  from  external  clients  during 
round  k.  We  note  in  passing  that,  since  r  is  a  run  of  the  full-information  pro¬ 
tocol  its  labeled  communication  graph  uniquely  determines  its  operating 
environment.  For  every  point  (r,  /),  we  denote  by  ^(r,  /)  the  first  /  -f  1  lay¬ 
ers  of  the  labeled  communication  graph  of  r,  representing  the  first  1  rounds 
of  r.  For  example,  illustrated  in  Figure  3.1(a)  is  a  graph  G{r,Z)  depicting 
the  first  3  rounds  of  a  run  r.  We  say  that  G{t,1)  has  length  1. 
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Informally,  at  every  point  (r,^)  of  a  run  of  a  processor  p,’s  local  state 
corresponds  to  a  certain  subgraph  Qi{r,i.)  of  G{r,l).  For  example,  the  sub¬ 
graph  Qi{r,  3)  of  Q{r,  3)  is  illustrated  in  Figure  3.1(b).  We  define  the  subgraph 
Qi{r,i)  of  ^(r,^)  inductively  as  follows.  For  ^  =  0  the  subgraph  ^i(r,0)  con¬ 
sists  of  the  labeled  node  (pi,0).  For  /  >  0  the  subgraph  ^i(r,/)  consists  of 
the  labeled  node  {pi,l),  the  subgraph  Gi{r,t—  1),  the  edges  from  layer  /  —  1 
nodes  to  (p,-,^),  and  the  subgraphs  —  1)  for  every  layer  t  —  1  node 

{Pj,i  —  1)  adjacent  to  (pul).  Given  a  set  S  of  processors,  it  is  convenient 
to  denote  by  Gsir,  i)  the  union  of  the  graphs  Giir,  1)  for  every  p,-  €  S.  We 
remark  that  Gsi'f',  i)  uniquely  determines  ^i(r,  1)  for  every  pi  €  S.  The  next 
lemma  states  that  a  processor’s  state  of  the  labeled  communication  graph 
uniquely  determines  its  view  of  the  run. 

Lemma  3.5:  Let  r  and  r'  be  runs  of  a  full-information  protocol  J-.  For 
every  processor  p,-  and  time  i,  ri{t)  =  ri(/)  iff  Gi{r,  1)  =  Gi{r',  1). 

Proof:  We  proceed  by  induction  on  t.  The  case  of  /  =  0  is  immediate. 
Suppose  £>  0  and  the  inductive  hypothesis  holds  for  /  —  1. 

Suppose  Pi  has  the  same  local  state  at  time  t  in  both  r  and  r'.  This 
implies,  in  particular,  that  pi  has  the  same  local  state  at  time  /—  1  in  r  and  r', 
and  from  the  inductive  hypothesis  it  follows  that  Giir,l~  1)  =  Gi{r',l  -  1). 
In  addition,  this  implies  that  p,-  must  receive  the  same  input  during  round  I 
in  r  and  r',  and  hence  (pi,/)  is  labeled  with  the  same  input  in  Gi{r,i)  and 
Gi{r',l).  If  Pi  does  not  receive  a  message  from  a  processor  pj  during  round  i 
in  r  and  r',  then  there  is  no  edge  from  {pj,l-  1)  to  {pi,l}  in  either  Gi{r,£) 
or  Gi{r',£).  If  Pi  does  receive  a  message  from  a  processor  py  during  round  £ 
in  r  and  r',  then  it  receives  the  same  message  in  both  runs,  and  py  must 
have  the  same  local  state  at  time  /  —  1  in  both  runs.  Hence,  there  is  an  edge 
from  {pj,£  -  1)  to  {pi,£)  in  both  Gi{r,£)  and  and  by  the  inductive 

hypothesis  we  have  that  Gj{r,  £-1)=  Gj{r',  £  -  1).  Thus,  Gi{r,  £)  =  Gi(r',  £). 

Conversely,  suppose  Gi{r,£)  =  Gi{r*j£).  It  follows  that  Gilr,£  -  1)  = 
Gi{r',£—  1),  and  by  the  inductive  hypothesis  p<  has  the  same  local  state  at 
time  /— 1  in  r  and  r'.  The  node  {pi,£)  must  be  labeled  with  the  same  input  in 
Gi{r,t)  and  Gi{r',£),  so  p,-  receives  the  same  input  during  round  /  in  r  and  r'. 
The  edges  from  layer  £—  1  nodes  to  (p*,  £}  are  the  same  in  ^j(r, £)  and  Gi(r\  £), 
so  Pi  receives  messages  from  the  same  processors  during  round  /  in  r  and  r'. 
Again,  Gj(r,£-1)  =  ^y(r',/-l)  for  every  node  {pji£-  1)  adjacent  to  (pi,f}. 


56 


CHAPTER  3.  PROGRAMMING  SIMULTANEOUS  ACTIONS 


and  by  the  inductive  hypothesis  pj  has  the  same  local  state  at  time  ^  —  1 
in  T  and  r'.  Since  T  requires  that  every  processor  send  its  entire  local  state 
in  every  round,  pi  receives  the  same  messages  during  round  /  in  r  and  r'.  It 
follows  that  Pi  has  the  same  local  state  at  time  I  in  both  r  and  r'.  □ 

Lemma  3.5  implies  that  a  processor’s  local  state  and  its  view  of  the 
corresponding  labeled  communication  graph  convey  the  same  information: 
Given  either  the  graph  Qi{r,t)  or  the  local  state  rj(/),  reconstructing  the 
other  is  straightforward.  Therefore,  an  equivalent  implementation  of  a  full- 
information  protocol  is  one  in  which  the  processors  send  the  labeled  commu¬ 
nication  graphs  corresponding  to  their  local  states  instead  of  sending  their 
entire  local  states.  From  now  on,  we  will  use  the  term  full-information  proto¬ 
col  to  refer  to  this  equivalent  form.  It  is  easy  to  see  that  the  size  of  Qi{r,  1)  is 
polynomial  in  the  number  of  processors  n,  the  global  time  t,  and  the  size  of 
the  requests  received  from  external  clients.  It  follows  that  messages  required 
by  a  full-information  protocol  are  of  polynomial  size.^  Furthermore,  given  the 
labeled  communication  graphs  corresponding  to  the  local  states  at  time  /  —  1 
of  the  processors  that  send  messages  to  a  given  processor  Pi  during  round  I, 
it  is  easy  to  construct  the  labeled  communication  graph  corresponding  to  p,  ’s 
local  state  at  time  1.  Thus,  the  use  of  such  compact  representations  of  a  pro¬ 
cessor’s  state  is  computationally  efficient  as  well  as  communication  efficient. 
Finally,  recall  that  we  have  formally  defined  a  test  for  common  knowledge  to 
be  a  function  of  processor  states  (as  well  as  n  and  t).  In  light  of  the  preced¬ 
ing  discussion,  there  is  no  loss  of  generality  in  assuming  that  such  a  test  is 
a  function  of  communication  graphs  corresponding  to  processor  states.  We 
now  turn  to  the  problem  of  implementing  such  tests. 

3.5  Testing  for  Common  Knowledge 

The  previous  section  established  the  claim  that  tests  for  common  knowledge 
provide  a  very  powerful  programming  technique:  The  design  of  optimal  pro¬ 
tocols  for  simultaneous  choice  problems  reduces  to  implementing  tests  for 
common  knowledge  of  certain  facts.  In  this  section  we  investigate  the  prob¬ 
lem  of  implementing  tests  for  common  knowledge  in  the  different  variants  of 

'*In  the  Byzantine  {ailuie  models,  however,  in  which  processors  are  allowed  to  lie  (or 
maliciously  deviate  from  the  protocol),  we  know  of  no  such  compact  representations.  See 
[Coa86]  for  a  trade-off  between  message  size  and  running  time  possible  in  such  models. 
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the  omissions  model.  With  such  tests,  we  vrill  be  able  to  construct  optimal 
protocols  for  simultaneous  choice  problems  in  these  models.  As  we  will  see, 
properties  of  the  different  variants  of  the  omissions  model  cause  dramatic  dif¬ 
ferences  in  the  complexity  of  testing  for  common  knowledge.  In  addition,  the 
optimal  protocols  we  construct  will  have  interesting  properties  that  vary  ac¬ 
cording  to  the  failure  model.  We  will  discuss  these  properties  as  we  consider 
each  variant  later  in  this  section. 

Recall  that  a  protocol  is  a  function  that,  given  the  number  of  processors  n, 
the  bound  t  on  the  number  of  faulty  processors,  and  a  processor’s  state,  yields 
a  list  of  the  actions  the  processor  should  perform,  as  well  as  the  messages 
it  should  send  in  the  next  round.  (Thus,  the  protocols  we  are  interested  in 
are  uniform  in  n  and  t.)  Since  the  protocols  we  will  be  concerned  with  are 
full-information  protocols,  processors’  states  will  be  efficiently  representable 
by  labeled  communication  graphs.  We  will  soon  restrict  our  attention  to 
simultaneous  choice  problems  in  which  the  external  requests  are  of  constant 
size  (or,  equivalently,  to  problems  involving  only  a  constant  number  of  possi¬ 
ble  requests  from  external  clients).  This  restriction  implies  that  processors’ 
states  at  time  I  will  be  of  size  polynomial  in  n  and  1.  A  protocol  will  there¬ 
fore  determine  the  messages  and  actions  required  at  time  I  based  on  input 
of  size  polynomial  in  n  and  t.  Consequently,  we  will  measure  the  complexity 
of  computations  performed  by  protocols  at  time  t  in  systems  of  n  processors 
as  a  function  of  n  and  t\  By  polynomial  time,  polynomial  space,  etc.,  we  will 
mean  polynomial  in  n  and  t. 

The  definition  of  simultaneous  choice  problems  presented  in  Section  3.3 
is  very  general,  so  general,  in  fact,  that  it  is  possible  to  define  simultane¬ 
ous  choice  problems  with  a  variety  of  anomalous  properties.  For  example, 
it  is  possible  to  define  a  simultaneous  choice  problem  in  which  pro(a)  is  the 
fact  <p  =  “the  first  round  in  which  p  receives  an  external  request  is  a  round 
whose  number  is  the  index  of  a  halting  Turing  machine”  (in  some  a  priori 
well-defined  enixmeration  of  Turing  machines),  and  coi^a)  ic  ->fp.  Clearly, 
since  it  is  imdecidable  whether  (p  holds  even  given  the  local  state  of  p  af¬ 
ter  it  receives  its  first  request,  it  will  also  be  undecidable  which  of  Ct/(p  or 
holds  when  processor  p’s  local  state  becomes  common  knowledge.  It 
follows  that  this  simultaneous  choice  problem  cannot  be  effectively  imple¬ 
mented  by  a  computable  protocol.  Similarly,  one  can  construct  simultaneous 
choice  problems  in  which  evaluation  of  the  conditions  is  intractable,  rather 
than  undecidable  as  in  the  above  example.  It  is  also  possible  to  introduce 


58 


CHAPTER  3.  PROGRAMMING  SIMULTANEOUS  ACTIONS 


anomalies  by  defining  the  sets  Ti  of  external  inputs  in  strange  ways.  Since 
we  are  not  interested  in  problems  involving  such  inherent  anomalies,  we  will 
avoid  them  by  making  restrictions  on  the  relevant  facts  and  the  inputs  arising 
in  the  simtdtaneous  choice  problems  we  will  consider  in  the  sequel. 

We  first  define  the  class  of  practical  facts,  which  will  be  used  to  restrict 
the  conditions  that  specify  a  simultaneous  choice  problem.  Roughly  speak¬ 
ing,  one  essential  property  of  a  practical  fact  ip  is  that  it  is  easy  to  determine 
from  a  processor’s  state  whether  a  run  satisfies  (p.  More  formally,  we  denote 
by  the  property  of  being  a  run  r'  such  that  QsiT^l)  =  ^5(/,/). 

Consequently,  if  Gs{i',  t)  D  <p  is  valid  in  a  system,  then  every  run  r'  of  the 
system  satisf3ring  Qsir^t)  =  must  also  satisfy  p.  In  this  case,  we 

say  that  Gsi’f'il)  determines  p.  Notice,  for  example,  that  no  finite  labeled 
communication  graph  Gs{t,  can  determine  that  a  run  is  failure-firee  (since 
the  run  is  infinite,  and  a  failure  can  always  happen  outside  the  finite  scope 
depicted  by  the  graph).  With  this  notion  in  mind,  a  fact  p  is  said  to  be 
practical  within  a  class  of  systems  {S(n,  t) :  n  >  t  -f  2}  if  the  following  con¬ 
ditions  hold:  (i)  y)  is  a  fact  about  the  input  and  the  existence  of  failures, 
and  (ii)  there  is  a  polynomial-time  algorithm  to  determine,  given  n,  t,  and  a 
graph  Gs{r,l)  of  a  point  of  S(n,t),  whether  Gsi'i'i^)  D  is  valid  in  S(n,t). 
The  first  condition  is  justified  by  the  fact  that  we  will  be  testing  for  common 
knowledge  of  the  conditions  enahled^ai)  arising  from  natural  simultaneous 
choice  problems,  and  such  conditions  we  typically  conditions  on  the  input 
and  existence  of  failures.  The  second  condition  ensures  that  it  is  easy  to  test 
whether  a  labeled  communication  graph  determines  that  the  fact  holds.  (We 
make  this  restriction  since  it  would  clearly  be  unreasonable  to  expect  the 
processors  to  be  able  to  efficiently  identify  and  act  bsised  on  facts  that  axe 
intractable  to  compute  from  the  labeled  communication  graph.) 

We  now  consider  a  natural  restriction  on  the  sets  Fj  of  possible  inputs. 
A  class  of  systems  is  said  to  be  practical  if  there  are  two  fixed  finite  sets  S 
and  M  of  initial  states  and  external  requests,  respectively,  such  that  each  Fj 
in  all  systems  of  the  class  is  the  set  of  complete  input  histories  whose  initial 
state  is  in  S,  and  in  which  the  input  received  in  every  round  is  a  subset 
of  M.  This  condition  ensures  that  the  input  sets  are  of  a  simple  form.  In 
particular,  it  implies  that  sdl  F{’s  are  identical,  and  that  the  input  received 
by  a  processor  during  any  given  round  is  of  constant  size. 

Having  defined  the  notions  of  practical  facts  and  practical  classes  of  sys¬ 
tems,  we  say  that  a  simultaneous  choice  C  is  practicalii  (i)  the  class  of  systems 
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determined  by  a  full-information  protocol  and  C  is  practical,  and  (ii)  each 
condition  enablec^ai)  is  practical  within  this  class  of  systems.  Essentially 
all  natural  simultaneous  choice  problems  are  practical.  In  particular,  all  si¬ 
multaneous  choice  problems  appearing  in  the  literature  are  practical.  Our 
analysis  will  hence  be  restricted  to  testing  for  common  knowledge  of  practi¬ 
cal  facts  and  to  designing  and  implementing  optimal  protocols  for  practical 
simultaneous  choice  problems.  We  remark,  however,  that  our  analysis  will 
apply  to  a  more  general  class  of  simultaneous  choice  problems,  whose  precise 
characterization  is  somewhat  complicated. 

In  Section  3.4  we  programmed  protocols  for  simultaneous  choice  problems 
in  a  high-level  language  in  which  processors’  actions  depend  on  explicit  tests 
for  common  knowledge.  Recall  that  iesi-for-Cusnablec^ai)  is  a  test  nonfaulty 
processors  can  use  to  determine  whether  enabled{ai)  is  common  knowledge: 
Given  the  graph  corresponding  to  the  local  state  of  a  nonfaulty  processor  at 
(r,/)  as  input,  te3t-for-C,/enabled{ai)  returns  true  iff  (r,/)  [=  Cj./enabled{ai). 
Theorem  3.4  implies  that  given  such  a  test  for  each  condition  enabled{ai), 
the  protocol  is  an  optimal  protocol  for  C.  Until  this  point,  however,  we 
have  sidestepped  the  issue  of  whether  such  tests  actually  exist.  With  the 
next  lemma  we  see  that,  for  practical  simultaneous  choice  problems,  such 
tests  can  be  implemented  in  polynomial  space. 

Lemma  3.6:  If  C  is  a  practical  simultaneous  choice  problem,  then  for  each 
action  Oj  the  test  test-for-Cf/enablec^ai)  can  be  implemented  in  polynomial 
space. 

Proof:  We  must  prove  the  existence  of  an  algorithm  test-for-Cj/  enabled{ai) 
determining  in  polynomial  space  whether  enabled{ai)  is  common  knowledge 
at  (r,f),  given  as  input  n,  t,  and  the  graph  Qj{T,t)  corresponding  to  the  local 
state  of  a  nonfaulty  processor  pj  at  (r,  t).  We  will  actually  exhibit  a  nondeter- 
ministic,  polynomial-space  algorithm  Ai  determining  whether  enable^ai)  is 
not  common  knowledge  at  (r,  f ).  Since  NPSPACE— PSPACE  and  PSPACE  is 
closed  under  complementation  (see  [HU85]),  the  existence  of  the  algorithm  Aj 
implies  the  existence  of  an  algorithm  test-for-C/renabled(^ai). 

Let  {S(n,  t)  :  71  >  t  -H  2}  be  a  class  of  systems  determined  by  a  full- 
information  protocol  and  the  problem  C.  We  claim  that  such  an  algorithm  Ai 
need  only  guess  a  point  {s,l)  with  the  property  that  D  enabled{ai) 

is  not  valid  in  E(n,t),  guess  the  path  from  (r,/)  to  (a,^)  in  the  similarity 
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graph  proving  that  (r,£)  ~  (3,/),  and  then  verify  that  these  two  conditions 
hold.  To  see  this,  notice  that  since  Q(s,£)  D  enabled(ai)  is  not  valid  in 
the  system,  there  must  be  a  point  (s',/)  such  that  Q(s,i)  =  Q(s',/)  and 
(s',i)  ^  enabled{ai).  Construct  the  run  with  the  input  of  s'  in  which  pro¬ 
cessors  fail  precisely  as  they  do  in  s  for  the  first  t  rounds,  and  in  which  no 
processor  fails  after  time  t.  Let  u  be  a  run  obtained  by  adding  to  this  run  a 
single  failure  after  time  I  iff  there  is  a  failure  in  s'.  Since  u  and  s'  must  satisfy 
the  same  facts  about  the  input  and  existence  of  failures,  {s',l)  ^  enablecl{ai) 
implies  (u,  £)  ^  enabled{ai).  Since  at  least  one  nonfaulty  processor  in  s  is  non- 
faulty  in  u,  and  also  has  the  same  local  state  at  time  £  since  Q(u,£)  =  Q{Sf£), 
we  have  (s,^)  ~  {u,£).  Therefore,  {t,£)  ~  (u,/)  and  {u,£)  )/=■  enabled{ai), 
and  it  follows  that  (r,^)  ^  Cuenabled{ai). 

We  now  describe  the  dgorithm  Ai  in  greater  detail.  Notice  that  since  C 
is  practical,  the  input  received  by  a  processor  in  every  round  of  a  run  of 
S(n,  t)  is  of  constant  size,  and  hence  it  is  possible  to  construct  the  labeled 
communication  graph  of  any  point  of  S(n,  t)  in  polynomial  space. 

The  algorithm  Ai  first  guesses  the  point  (s,  £)  and  writes  it  down  in  poly¬ 
nomial  space.  Since  enabled{ai)  is  a  practical  fact,  Ai  can  show  in  polynomial 
time  (and  hence  in  polynomial  space)  that  G{s,£)  D  enabkd{ai)  is  not  valid 
in  the  system  S(n,t). 

The  algorithm  then  guesses  the  path  from  (r,  £)  to  (s,  £)  in  the  similarity 
graph  step  by  step,  verifying  each  step  in  polynomial-space  as  it  goes.  The 
algorithm  Ai  begins  by  constructing  the  graph  Q{r',  £)  of  a  run  r'  by  adding 
to  the  graph  Qj{r,  £)  received  as  input  all  edges  not  recorded  as  missing  in 
Gj(r,£).  Notice  that  since  pj  is  nonfaulty  in  r,  it  is  nonfaulty  in  r'  as  well, 
and  hence  (r,^)  ~  The  algorithm  Ai  then  shows  that  {r',£)  ~  {s,t) 

(and  hence  that  (r,^)  ~  (5,^))  in  polynomial  space  by  constructing  one  by 
one  the  graph  ^(ui,^)  of  each  point  {ui,£)  in  a  path  from  (r',/)  to  (a,/) 
in  the  similarity  graph.  For  each  pair  of  points  (ui_i,/)  and  {ui,£),  the 
algorithm  shows  that  some  nonfaulty  processor  p*  has  the  same  local  state 
at  both  points  by  choosing  Pk,  exhibiting  for  each  point  an  assignment  of 
faulty  processors  (consistent  with  their  respective  graphs)  in  which  pk  is 
nonfaulty,  and  showing  that  pk  has  the  same  local  state  at  both  points  by 


verifying  =  Gi.(u--£'^, 


□ 


It  is  important  to  realize  that  Lemma  3.6  holds  in  all  variants  of  the 
omissions  model:  The  failure  model  is  a  peurameter  of  a  simultaneous  choice 
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problem,  and  we  have  made  no  assumptions  restricting  the  failure  model  in 
this  result.  We  note  that  the  proof  of  Lemma  3.6  actually  shows  that  testing 
for  common  knowledge  of  any  practical  fact  can  be  done  in  polynomial  space. 
In  fact,  the  proof  shows  that  such  tests  have  effective  implementations  even 
when  the  algorithm  determining  whether  Q(r,  i)  D  enabled{ai)  is  valid  does 
not  run  in  polynomial  time  (although  the  problem  must  still  be  decidable). 
In  this  case,  however,  the  test  is  guaranteed  to  run  in  polynomial  space  only 
if  this  computation  can  be  performed  using  polynomial  space.  The  most 
important  consequence  of  Lemma  3.6,  however,  is  that  practical  simultaneous 
choice  problems  have  polynomial-space  optimal  protocols. 

Theorem  3.7:  If  C  is  an  implementable  practical  simultaneous  choice  prob¬ 
lem,  then  there  is  a  polynomial-space  optimal  protocol  for  C. 

With  Theorem  3.7  we  see  that  practicsd  simultaneous  choice  problems 
do  have  effective  optimal  protocols.  In  general,  however,  connected  com¬ 
ponents  in  the  similarity  graph  may  be  of  exponential  size,  and  paths  in 
such  components  may  be  of  exponential  length.  It  therefore  follows  that 
the  polynomial-space  protocol  given  by  Theorem  3.7  requires  the  processors 
to  perform  exponential-time  computations  between  consecutive  rounds  of 
communication.  The  resulting  protocol  is  therefore  clearly  not  a  reasonable 
protocol  to  use  in  practice.  A  crucial  question  at  this  point  is  whether  there 
are  efficient  optimal  protocols  for  simultaneous  choice  problems.  Recall  that 
we  have  already  seen  that  optimal  protocols  can  be  implemented  in  a  way 
that  makes  efRcient  use  of  communication.  The  rest  of  the  chapter  is  de¬ 
voted  to  investigating  ways  of  implementing  tests  for  common  knowledge  in 
variants  of  the  omissions  model  in  a  computationally-eiRcient  manner,  and 
therefore  of  implementing  efficient,  optimal  protocols  for  simultaneous  choice 
problems  in  these  models. 

3.5.1  The  Omissions  Model 

Recall  that  in  the  omission  model  a  faulty  processor  may  fail  only  by  failing 
to  send  some  of  the  messages  its  protocol  requires  it  to  send.  In  this  sec¬ 
tion  we  consider  the  problem  of  efficiently  implementing  tests  for  common 
knowledge  in  the  omissions  failure  model.  In  particular,  we  develop  a  con¬ 
struction  that  crisply  characterizes  the  connected  component  of  a  point  in 
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the  similarity  graph.  This  construction  determines  a  subgraph  of  the  labeled 
communication  graph  with  the  property  that  two  points  are  similar  iff  their 
respective  subgraphs  are  identical.  As  stated  in  Theorem,  2.3,  the  connected 
component  of  a  point  in  the  similarity  graph  completely  determines  what 
facts  are  common  knowledge  at  that  point.  As  a  result,  this  construction 
enables  us  to  devise  efficient  tests  for  common  knowledge,  and  hence  efficient 
protocols  for  simultaneous  choice  problems  that  are  optimal  in  all  runs. 

Dwork  and  Moses  address  in  [DM86]  the  problem  of  implementing  tests 
for  common  knowledge  in  the  crash  failure  model  In  the  crash  failure  model, 
processors  fail  by  crashing;  that  is,  faulty  processors  may  successfully  send 
messages  to  some  processors  during  their  failing  round,  but  will  not  success¬ 
fully  send  any  messages  in  any  later  round.  As  a  result,  a  faulty  processor 
is  “out  of  the  game”  after  its  failing  round,  and  no  longer  contributes  to  the 
knowledge  of  the  remaining  processors.  The  analysis  performed  by  Dwork 
and  Moses  focuses  on  the  notion  of  a  clean  round,  a  round  in  which  no  pro¬ 
cessor  failure  is  discovered.  In  runs  of  a  full-information  protocol,  a  clean 
round  ensures  that  all  nonfaulty  processors  receive  the  same  set  of  messages. 
After  such  a  round,  all  nonfaulty  processors  have  an  identical  view  of  the  part 
of  the  run  that  precedes  the  clean  round.  Dwork  and  Moses  show  that  facts 
about  the  initial  configuration  become  common  knowledge  exactly  when  it 
becomes  common  knowledge  a  clean  round  has  occurred.  Dwork  and  Moses 
complete  their  analysis  by  characterizing  when  this  happens.  In  the  omis¬ 
sions  model,  however,  because  a  faulty  processor  need  not  remain  silent,  or 
crawh,  after  first  failing  to  send  a  message),  a  faulty  processor  may  continue 
to  contribute  to  the  knowledge  of  the  nonfaulty  processors,  even  after  its  first 
failing  round.  The  situation  is  therefore  more  complicated,  and  clean  rounds 
no  longer  play  the  same  role  here  as  they  do  in  the  crash  failure  model.  Fur¬ 
thermore,  to  the  best  of  our  understanding,  there  is  no  direct  anedog  to  the 
notion  of  a  clean  round  in  the  omissions  model.  The  approach  used  by  Dwork 
and  Moses  in  the  crash  failure  model,  therefore,  does  not  seem  to  extend  to 
this  model.  As  a  result,  we  are  forced  to  take  a  different  approach.® 


^As  mentioned  in  the  introduction,  since  the  technical  details  of  the  proofs  in  this 
section  may  make  it  difficult  to  obtain  a  high-level  understanding  of  our  approach,  we 
encourage  the  reader  to  skip  the  proofs  on  the  first  reading. 
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The  Basic  Steps 

We  now  give  what  will  become  the  two  basic  steps  of  our  test  for  conunon 
knowledge  during  runs  of  a  full-information  protocol  in  the  omissions  model. 
(Unless  otherwise  mentioned,  all  protocols  referred  to  in  this  section  will 
be  full-information  protocols.)  Our  approach  to  the  problem  of  testing  for 
common  knowledge  is  motivated  by  a  careful  analysis  of  what  facts  do  not 
become  common  knowledge.  We  begin  with  a  technical  result,  similar  to 
Lemma  15  of  [DM86],  saying  that  two  points  are  similar  if  they  differ  only 
in  the  faulty  behavior  exhibited  by  a  single  processor  in  the  last  few  rounds. 

Throughout  the  remainder  of  this  chapter  it  will  be  convenient  to  refer 
to  runs  differing  only  in  some  aspect  of  their  operating  environments.  Given 
two  runs  r  and  r'  of  a  protocol  we  will  say  that  r  differs  from  r'  only  in  a 
certain  aspect  of  the  operating  environment  if  r  is  the  result  of  executing  !F  in 
an  operating  environment  that  differs  from  that  of  r'  only  in  the  said  aspect. 
Notice  that  while  their  operating  environments  may  be  similar,  the  messages 
sent  in  the  two  runs  may  actually  be  quite  different.  We  say  that  a  processor 
is  silent  from  time  k  if  it  fails  to  send  every  message  in  every  round  following 
time  k. 

Lemma  3.8:  Let  r  and  r'  be  runs  differing  only  in  the  (faulty)  behavior 
displayed  by  processor  p  after  time  k,  and  suppose  no  more  than  /  processors 
fail  in  either  r  or  r'.  lit  —  k<t  +  l  —  f,  then  (r,  1)  ~  (r',/). 

Proof;  If  fc  >  f  then  Q{r,t)  =  G{r’,l),  and  Lemma  3.5  implies  that 
(r,f)  ~  (r',f).  Therefore,  assume  k  <  1.  We  proceed  by  induction  on 
j  =  £  —  k.  Without  loss  of  generality,  we  may  assume  that  r  and  r'  ac¬ 
tually  differ  in  the  faulty  behavior  of  p,  and  hence  that  p  fails  in  one  of  these 
runs.  Notice  that  since  p  already  fails  in  one  of  these  runs  and  yet  no  more 
than  /  processors  fail  in  either  run,  it  is  clear  that  at  most  f  <t  processors 
fail  in  any  run  difFering  from  either  run  only  in  the  faulty  behavior  of  p. 

Suppose  j  =  1  (that  is,  k  =  £  —  1).  Since  t  <  n  —  2  and  since  r  and  r' 
differ  only  in  the  behavior  of  p,  there  are  two  processors  qi  and  q2  (other 
than  p)  that  do  not  fail  in  either  run.  Let  rj  be  the  run  differing  from  r  only 
in  that  p  sends  to  92  during  round  t  of  r2  iff  it  does  so  in  r'  (and  notice  that  r2 
may  actually  be  equal  to  r).  Since  qis  local  state  at  time  I  is  independent  of 
whether  p  sends  to  ^2  during  round  /,  we  have  {r,l)  ~  {r2,l).  Since  G{r2,£) 
and  Q{r',  £)  differ  only  in  the  messages  that  p  sends  to  processors  other  than  32 
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in  round  t,  and  ga’s  local  state  at  {r2,l)  is  independent  of  whether  p  sends 
to  the  remaining  processors  during  round  I,  we  have  (r2,^)  ~  Thus, 

by  the  transitivity  of  we  have  {r,l)  ~(r',4 

Suppose  i  >  1  (that  is,  A  <  /  —  1)  and  the  inductive  hypothesis  holds 
for  i  —  1.  Let  Ti  be  the  run  differing  from  r  only  in  that  for  each  processor  q 
in  {Pi, . . .  ,Pi}  processor  p  sends  to  q  during  round  A  +  1  in  rj  iff  it  does  so 
in  t'.  Notice  that  r  =  tq.  We  will  show  that  (r,^)  ~  (^•»)^)  for  all  t  >  0. 
Since  Tn  differs  from  r'  only  in  the  faulty  behavior  of  p  after  time  fc  +  1, 
and  since  £  —  (A  +  1)  =  j  —  1,  it  will  follow  by  the  inductive  hypothesis  for 
j  -  1  that  (rn,^)  ~  (^’^^)•  Finally,  by  the  transitivity  of  we  will  have 
[r,t)  ~  (r',  1)  as  desired. 

We  now  proceed  by  induction  on  i  to  show  that  (r,  t)  ~  (r,-,  t)  for  all  z  >  0, 
The  case  of  i  =  0  is  trivial.  Suppose  i  >  0  and  the  inductive  hypothesis 
holds  for  i  —  1;  that  is,  (r,^)  ~  (Pi_i,^).  Notice  and  differ  at  most 
in  whether  p  sends  a  message  to  p;  during  round  +  1.  Let  a  be  the  run 
differing  from  in  that  Pi  is  silent  from  time  k+lin.  a.  Suppose  no  more 
than  g  processors  fail  in  either  or  a.  Notice  that  g  <  f  +  L  Therefore, 
since  1<1  — k<t  +  l  —  fwe  have  f  <t  and  S'  <  t,  so  at  most  t  processors 
fail  in  a.  Furthermore,  £  —  (fc  +  l)<t  +  l  —  (/  +  l)<t  +  l  —  Since,  in 
addition,  rj-i  and  a  differ  only  in  the  faulty  behavior  of  p,-  after  time  +  1, 
the  inductive  hypothesis  for  j  —  1  implies  (r,_i,^)  {8,1).  Now,  since  pi  is 

silent  from  time  fc  -4- 1  in  s,  the  local  state  of  a  nonfaulty  processor  at  {3,1) 
is  independent  of  whether  p  sends  to  p,-  during  round  k  +  l,so  {a,t)  ~  {3',i) 
where  a'  differs  from  a  in  that  p  sends  to  p;  during  round  fc  + 1  in  s'  iff  it  does 
so  in  r,-.  Again,  the  inductive  hypothesis  for  j  —  1  implies  that  (s',  t)  ~  (r;,  t). 
By  the  transitivity  of  it  follows  that  {r,£)  ~  (r^,/).  □ 

While  Lemma  3.8  is  a  technical  lemma  in  the  context  of  this  work,  it 
has  a  number  of  interesting  consequences  in  its  own  right.  In  particular,  the 
(t +  l)-round  lower  bound  on  the  number  of  rounds  required  for  simultaneous 
Byzantine  agreement  is  an  immediate  corollary  of  this  lemma.  The  resulting 
proof  of  this  lower  bound  is  perhaps  the  simplest  to  appear  in  the  litera¬ 
ture  (see  [DM86]  for  details).  More  important  for  our  purposes,  however, 
is  the  fact  that  two  corollaries  of  Lemma  3.8  enable  us  to  characterize  the 
connected  components  of  the  similarity  graph.  Consider  the  runs  ti  and  vz 
of  Figure  3.2,  where  we  indicate  only  faulty  behavior:  solid  lines  indicate 
silence,  and  dashed  lines  indicate  sporadic  faulty  behavior.  Notice  that  / 
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Figure  3.2:  Runs  illustrating  Lemma  3.9. 


processors  fail  in  ri.  In  the  following  lemma  we  show  that  ~ 

where  r2  differs  from  rj  only  ii;  that  processors  failing  in  r\  are  silent  in  T2 
from  time  fc,  where  =  ^  —  (t  +  1  —  /).  This  is  the  first  bwic  step  of  oux 
test  for  common  knowledge. 

Lemma  3.9:  Let  ri  be  a  run  in  which  /  processors  fail.  Let  rj  be  the  run 
differing  from  ri  only  in  that  processors  failing  in  ri  are  silent  from  time  k 
in  r2,  where  A:  =  ^  -  (t  +  1  -  /).  Then  (n,/)  ~  (r2,/). 

Proof:  Let  qi,...,qf  be  the  faulty  processors  in  ri.  Let  Si  be  the  run 
differing  from  ri  in  that  processors  qi,...,qi  are  silent  from  time  k  in  8i. 
Notice  that  r\  =  so  and  r,  =  Sf.  We  proceed  by  induction  on  t  to  show  that 
(ri,/)  ~  (si,/)  for  all  i.  The  case  of  t  =  0  is  trivial.  Suppose  i  >  0  and  the 
inductive  hypothesis  holds  for  i  -  1;  that  is,  (ri,/)  ~  («{_!,/).  Since  s,_i 
and  Si  differ  at  most  in  the  faulty  behavior  of  qi  after  time  it  follows 
by  Lemma  3.8  that  ~  By  the  transitivity  of  we  have 

□ 

One  interesting  consequence  of  this  result,  for  example,  is  that  the  states 
at  time  k  of  processors  failing  in  ri  are  not  common  knowledge  at  time  i. 
To  see  this,  let  F  be  the  set  of  processors  failing  in  ri,  and  suppose  it  is 
common  knowledge  at  (ri,l)  that  “the  joint  view  of  F  at  time  k  is  equal 
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to  This  means  that  this  statement  is  true  at  all  points  in  (ri,^)’s 

connected  component.  But  let  rj  and  r'^  be  runs  differing  from  ri  amd  r2 
only  in  that  some  already-faulty  processor  p  E  F  fails  to  send  to  another 
already-faulty  processor  q  E  F  during  round  k.  Notice  that  the  joint  view 
of  F  at  time  k  in  r[  is  not  equal  to  rip(k).  Yet  according  to  our  lemma, 
(ri,£)  ~  (r2,/)  and  (ri,f)  ~  (r^,  f);  and  since  the  processors  in  F  are  silent 
from  time  fc,  the  points  (r2,  and  {r^,  1)  are  indistinguishable  to  all  nonfaulty 
processors;  and  so  {r2,t)  ~  which  implies  (ri,^)  ~  (’’i.  t),  and  hence 

and  are  in  the  same  connected  component!  Consequently,  the 

time  k  views  of  processors  in  the  set  F  cannot  be  common  knowledge  at 
(ri,/).  Interestingly,  our  next  result  will  show  that  even  the  identity  of  F 
itself  (the  identity  of  the  faulty  processors)  may  not  be  common  knowledge 
at  (rui). 

Before  discussing  the  second  lemma,  however,  we  madce  an  important 
definition.  Given  a  point  (r,  k)  and  a  set  of  processors  G,  let 

B{G,  r,  k)  {p  :  (r,  k)  |=  lai  ‘^p  is  faulty”)} . 

By  this  definition,  B{G,T,k)  is  the  set  of  processors  implicitly  known  by  G 
at  (r,  k)  to  be  faulty.  An  important  property  of  the  omissions  failure  model 
is  that  processors  fail  only  by  failing  to  send  messages.  It  follows  that  G 
implicitly  knows  at  (r,  k)  that  a  processor  p  is  faulty  iff  G  implicitly  knows 
at  (r,  k)  of  some  processor  g  not  receiving  a  message  from  p  at  time  k  or 
earlier;  that  is,  0g(^>  contains  no  edge  from  (p,  f  —  1)  to  (g,  /}  for  some 
node  (q,£)  of  Uoir,  k).  It  is  therefore  simple  and  straightforward  to  compute 
B(G,  r,  k)  given  Qair,  k). 

The  essence  of  the  second  lemma  is  captured  by  the  runs  r2  and  of 
Figure  3.3.  In  the  run  r2,  the  /  faulty  processors  are  silent  from  time  k  = 
(f +1— /).  The  set  G  is  the  set  of  nonfaulty  processors  and  B  =  B{G,  r2,  k). 
The  run  differs  from  r2  only  in  that  processors  in  P  —  B  do  not  fail  in  r^. 
The  following  lemma  states  that  (r2,^)  ~  (t’s,^).  This  implies,  for  instance, 
chat  the  failure  of  processors  in  P— B  cannot  be  common  knowledge  at  (r2,/) 
since  they  do  not  fail  in  rz.  Formally,  the  second  basic  step  of  our  test  for 
common  knowledge  can  be  stated  as  follows  (see  Figure  3.3): 

Lemma  3.10:  Let  r2  be  a  run  in  which  the  /  faulty  processors  are  silent 
from  time  k  =  £  —  {t  +  1  —  f).  Let  G  be  the  set  of  nonfaulty  processors  in 
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The  run  rj.  The  run  r^. 

Figure  3.3:  Runs  illustrating  Lemma  3.10. 


r2,  and  let  B  =  B{G,T2^k).  Let  tz  be  the  run  dilFering  from  r2  only  in  that 
processors  in  P  -  B  do  not  fail.  Then  (r2,£)  ~  (ra,^). 


Proof:  If  a  processor  p  in  P  —  B  fails  to  a  processor  q  during  some  round 
j  <koi  Tz  (in  which  case  it  must  be  that  p  G  P—B—G),  then  the  node  {q,j) 
must  not  be  a  node  of  ^g(^2)  k)  or  the  failure  of  p  would  be  implicitly  known 
by  G  at  time  k  and  p  would  be  in  B,  a  contradiction.  Thus,  Qairz,  k)  is 
independent  of  whether  ^(r2,  k)  contains  an  edge  from  p  to  g  during  round  j. 
Let  rj  be  a  run  differing  from  r2  only  in  that  no  processor  in  P  —  B  fails 
before  time  k  in  rj.  By  the  previous  discussion,  ^g(^2>  ^)-  la 

both  r2  and  rj  every  processor  in  G  successfully  sends  every  message  after 
time  k  and  every  processor  in  P  —  G  is  silent  from  time  k.  Since,  in  addition, 
every  processor  in  G  receives  the  same  input  after  time  k  in  and  rj,  we 
have  ^g(t’2)0  =  ^g(7'2>^)-  Given  that  G  is  the  set  of  nonfaulty  processors 
in  r2,  each  of  which  is  also  nonfaulty  in  r^,  it  follows  by  Lenuna  3.5  that 
{T2,i)  ~  (t’2>^)-  Since  the  runs  rj  and  Tz  differ  only  in  the  faulty  behavior  of 
processors  in  P  —  B  after  time  ft,  by  repeated  application  of  Lemma  3.8  it 
follows  that  (7*2,^)  ~  (^•3,^)-  Hence,  (rz,!)  ~  (t'3,/).  □ 
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Characterizing  the  Similarity  Graph 

Let  us  now  consider  how  these  two  basic  steps,  Lemmas  3.9  and  3.10,  can 
be  used  to  characterize  the  connected  components  of  the  similarity  graph, 
and  hence  what  facts  are  common  knowledge  at  a  given  point.  Going  back 
to  Figures  3.2  and  3.3,  notice  that  if  /'  <  /  (which  implies,  referring  to  Fig¬ 
ure  3.3,  that  not  all  /  processors  failing  in  ri  are  implicitly  known  at  time 
fc  =  ^  —  (t  -f  1  —  /)  to  be  faulty),  then  by  setting  r[  =  ra  we  can  apply  Lem¬ 
mas  3.9  and  3.10  again  (this  time  starting  from  instead  of  t\).  Iterating 
this  process,  we  reach  a  run  f  satisfying  ~  (r,^),  where  the  /  proces¬ 
sors  failing  in  f  are  silent  from  time  k  =  I  —  {t  +  1  —  f),  and  where  all  faulty 
processors  are  implicitly  known  to  be  faulty  by  the  nonfaulty  processors  at 
(f ,  k).  This  run  f  is  a  fixpoint  of  this  iterative  process;  setting  fi  =  f ,  the 
runs  f2  and  constructed  in  Lemmas  3.9  and  3.10  are  identical  to  f.  It  is 
the  joint  view  of  the  nonfaulty  processors  at  (f ,  k),  we  will  show,  that  charac¬ 
terizes  the  connected  component  of  (ri,^)  in  the  similarity  graph,  and  hence 
what  facts  are  common  knowledge  at  (ri,^).  To  enable  ourselves  to  turn  this 
characterization  into  a  test  for  common  knowledge  individual  processors  can 
compute  locally,  we  now  define  a  local  version  of  this  iterative  process,  illus¬ 
trated  in  Figure  3.4,  that  individual  processors  can  use  to  construct  locally 
this  joint  view. 

Given  a  point  (r,/)  and  a  processor  p,  this  construction  is  defined  as 
follows.  Define  Go  =  {p}  and  ko  =  t,  and  define  Gi+i  and  ki+i  inductively 
as  follows.  Denoting  B{Gi,T,k^  by  Bi,  let 

Gi+i  =  P-Bi 


fci+i  =  /  —  (t  +  1  — 

One  should  ask  what  happens  to  this  construction  when  becomes  nega¬ 
tive.  Recall  that  when  Aj+i  <  0,  the  local  state  at  time  ki+i  of  every  processor 
in  Gi^i  is  the  distinguished  empty  local  state.  It  follows  that  when  ki+i  <  0, 
the  set  Bi+i  must  be  empty.  As  a  consequence,  for  all  j  >  i  -b  1,  we  have 
that  Gj  =  P,  kj  =  i  —  {t  +  1),  and  Bj  is  empty. 

While  we  claim  this  is  a  construction  each  processor  can  perform  locally, 
the  set  5(G„  r,  is  defined  in  terms  of  r,  which  indi'ridual  processors  can¬ 
not  possibly  know.  We  will  soon  show,  however,  that  individual  processors 
have  enough  information  in  their  local  state  to  compute  B(Gi,r,  ki)  without 
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Figure  3.4:  An  example  of  the  construction  when  t  —  9. 


knowing  the  precise  identity  of  r,  and  hence  can  perform  the  steps  of  this 
construction  locally. 

The  construction  determines  three  (infinite)  sequences  {A?,*},  and 
{Bi}.  In  the  next  few  pages  we  will  see  that  these  sequences  have  limits 
G,  k,  and  B,  and  that  these  limits  are  independent  of  the  processor  with 
which  the  construction  is  begun.  As  a  result,  individual  processors  will  be 
able  to  construct  these  values  based  solely  on  their  local  state.  We  will 

A  A 

see  that  the  joint  view  of  G  at  time  k  completely  characterizes  the  connected 
component  of  (r,  i)  in  the  similarity  graph,  and  hence  what  facts  are  common 
knowledge  at  (r,^).  This  construction  will  therefore  provide  an  efficient  way 
of  determining  what  facts  are  common  knowledge  at  a  given  point. 

Among  other  things,  this  construction  captures  a  number  of  essential 
aspects  of  the  information  flow  during  the  run  up  to  time  i.  In  particular, 
one  important  property  of  this  construction  is  the  following: 

Lemma  3.11:  Every  processor  in  Gi+i  successfully  sends  to  every  processor 
in  Gi  in  every  round  before  time  A:,-. 

Proof:  Suppose  some  processor  q  of  fails  io  send  to  a  processor  q'  of  Gi 
during  a  round  before  time  ki.  Then  g’s  failure  to  q'  is  implicitly  known  by  Gi 
at  time  ki,  so  q  €  Bi  and  q  ^  G»+i,  a  contradiction.  □ 
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One  consequence  of  Lemma  3.11  is  that  the  local  state  of  the  processor  p 
at  time  t  must  contain  the  local  state  of  every  processor  in  Gi  at  time  ki 
for  every  i  >  0.  One  property  of  the  construction,  therefore,  is  that  the 
construction  depends  only  on  the  local  state  of  processor  p  at  and 

hence  that  p  is  able  to  perform  the  construction  locally.  This  property  is 
essential  in  order  to  use  this  construction  in  a  test  for  common  knowledge 
that  p  can  perform  locally.  A  second  essential  property  of  the  construction  is 
that  it  converges  within  t  +  1  iterations,  as  we  see  with  the  following  result. 


Lemma  3.12:  lim  Gi  =  Gt+i  and  lim  ki  =  kt+i. 

t*-»oo  t— fOO 


Proof:  We  will  show  that  Bi+i  C  Bi  for  edl  i  >  0.  Since  Bo  contains  at 
most  t  processors,  it  will  then  follow  that  there  must  be  an  t  <  t  for  which 
Bi  =  Bi+i.  From  the  definition  of  the  construction,  it  is  easy  to  see  that  we 
will  have  Bi  =  Bi^j  for  all  j  >  0.  In  addition,  we  will  have  Gi+i  =  G<+i+j 
and  ki+i  =  fci+i+j  for  all  j  >  0,  amd  we  will  be  done.  We  proceed  by  induction 
ont.  Iffc,-+i  <0,  then  Bi+i  is  empty  and  Bi+i  C  so  let  us  assume  ki+i  >  0. 
Suppose  1  =  0.  By  Lemma  3.11,  every  processor  in  Gi  must  send  to  every 
processor  in  Go  during  roimd  ki  +  1.  It  follows  that  any  failure  implicitly 
known  by  Gi  at  time  ki  must  be  implicitly  known  by  Go  at  time  ko.  Thus, 
Bi  C  Bo-  Suppose  i  >  0  and  the  inductive  hypothesis  holds  for  t  —  1;  that 
is,  Bi  C  5,_i.  If  Bi  —  5,-1,  then  =  Bi-  If  Bi  C  B,-i,  then  fej+i  <  ki- 
By  Lemma  3.11,  G;+i  sends  to  Gi  during  round  ki+i  +  1,  so  Bi+i  C  Bi-  □ 

We  denote  the  results  of  the  construction  (the  limits  of  the  sequences 
{Gi},  {ki},  and  {5i})  by  G,  k,  and  B.  We  denote  these  values  by  G{p,r,t), 
k{p,  r,  £),  and  B{p,  r,  t)  when  the  processor  p  and  the  point  (r,  t)  are  not  clear 
from  context.  We  now  show,  however,  that  these  values  are  independent  of 
the  processor  p. 


Lemma  3.13:  G{p,r,t)  =  G{q,r,i)  and  k{p,T,l)  =  k{q,T,l)  for  all  proces¬ 
sors  p  and  q. 


Proof:  We  prove  the  claim  by  showing  that  B{p,r,l)  =  B{q,r,t).  Given 
that  Bi  uniquely  determines  Gj+i  and  fcj+i,  this  will  imply  the  desired  result. 
It  suffices  to  show  that  B{p,r,i)  C  B{q,T,t),  since  the  other  direction  will 
follow  by  symmetry.  Denote  the  intermediate  results  of  the  construction 
from  the  point  {r,i)  starting  with  the  processor  p  by  Gi,  ki,  and  Bi,  and  the 
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A  A  A 

nnal  results  by  G,  k,  and  B.  Similarly,  denote  the  intermediate  results  of 
the  construction  sttirting  with  q  by  (?{,  and  B-^  and  the  final  results  by 
G',  k',  and  B'.  We  now  show  that  B  Q  B'.  If  fc  <  0,  then  B  is  empty  and 

A  A  A  A 

B  C  B\  so  assume  A  >  0.  We  consider  two  cases.  First,  suppose  k  =  i  —  1. 
In  this  case,  B  must  contain  t  faulty  processors  since  A  =  ^  —  (<  +  l  —  |J5|). 
It  follows  that  every  processor  in  G  must  be  nonfaulty  in  r  and  hence  must 
send  to  Gq  during  round  A  +  1,  so  B  C  Bq.  Since,  in  addition,  |J5ol  <t  and 
|.S|  =  t,  we  have  B  =  Bq.  It  follows  from  the  construction  that  B  =  B-  for 
every  i  >  0,  and  hence  that  B  =  B*. 

Now,  suppose  A  <  ^  —  1.  Let  q'  be  an  (arbitrary)  nonfaulty  processor 
in  r.  We  claim  that  every  processor  in  G  must  send  its  local  state  to  q' 
during  round  A  +  1.  Suppose  some  processor  p  in  G  does  not.  Let  j  be  the 
least  integer  such  that  G  =  Gj.  If  j  =  1,  then  q'  must  send  to  Go  during 
round  A  +  2.  If  j  >  1,  then  q'  must  actually  be  a  member  of  G,_i  since  G,_i 
must  contain  all  of  the  nonfaulty  processors.  In  either  case,  the  failure  of  g 
to  q'  during  round  A  +  1  must  be  implicitly  known  by  Gj-i  at  time  A,_i,  so 
g  €  Bj-i.  Since  G  =  Gj  =  P  —  5,_i,  we  have  g  ^  G,  &  contradiction.  Thus, 

A  A 

every  processor  in  G  must  send  to  q'  during  round  A  +  1. 

We  now  proceed  by  induction  on  t  to  show  that  B  C  B-  for  all  t  >  0. 
Suppose  t  =  0.^  Every  processor  in  G  must  send  to  the  nonfaulty  processor  q' 
during  round  A  +  1,  and  q'  must  send  to  Gq  during  round  k  +  2,  so  B  C  Bq. 
Suppose  t  >  0  and  the  inductive  hypothesis  holds  for  i  —  1;  that  is,  B  C  Bl_j. 
HB  =  Bl_i,  then  B  =  B-.  IfB  C  B-_i,  then  A  <  k-  since  A  =  /—  (t + 1  —  |.B|) 
and  A;  =  ^  -  (f  + 1  —  |).  Every  processor  in  G  must  send  to  the  nonfaulty 

A  A 

processor  q'  during  round  A  +  1,  and  q'  must  be  contained  in  GJ,  so  B  C  B-. 
It  follows  that  B  C  Bi  for  all  i  >  0,  and  hence  B  C  B'.  □ 

AAA 

As  a  result  of  Lemma  3.13,  we  see  that  G,  A,  and  B  depend  only  on  the 
point  (r,/),  and  not  the  processor  with  which  the  construction  begins.  Thus, 
a  third  property  of  this  construction  is  that  every  processor  (and  not  just  the 
nonfaulty  processors)  is  able  to  compute  locally  the  values  of  G,  A,  and  B. 
The  ability  of  the  nonfaulty  processors  to  compute  these  values  locally  will 
be  essential  to  designing  a  locally-computable  test  for  common  knowledge. 
We  will  denote  these  values  by  G{r,l),  k{r,l),  and  B{r,t)  when  (r,/)  is  not 


clear  from  context.  From  the  definition  of  the  construction  it  is  clear  that  the 
driving  force  behind  the  construction  is  the  identity  of  the  sets  Bi.  Notice 
that  these  sets  are  uniquely  determined  by  the  failure  pattern,  and  do  not 
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depend  on  the  run’s  input.  Taking  into  account  the  input  of  a  run,  we  are 
now  in  a  position  to  show  how  the  construction  characterizes  the  connected 
components  in  the  similarity  graph.  Denoting  (?(r,  /)  by  G  and  k{r,  1)  by  k, 
we  define 

Hr,l)  =  rS). 

This  definition  says  that  ^(r,/)  is  the  joint  view  of  the  processors  in  G(r,/) 
at  time  k(r,  £).  Our  next  lemma  states  that  V  is  the  same  at  similar  points, 
which  implies  that  the  joint  view  V(r,i)  is  common  knowledge  at  (r,/). 

Lemma  3.14:  If  (r,£)  ~  (r\i)  then  V(r,£)  = 

Proof:  We  proceed  by  induction  on  the  distance  d  between  the  points 

(r,^)  and  The  case  of  d  =  0  is  trivial.  Suppose  that  d  >  0  and  the 

inductive  hypothesis  holds  for  d  —  1.  Since  the  distance  between  and 
(r,t)  is  d,  there  must  be  a  point  {s,t)  whose  distance  from  (r,/)  is  d  —  1, 
and  whose  distance  from  (t’,1)  is  1.  The  inductive  hypothesis  implies  that 
^(r,^)  =  1^(3,^),  and  we  have  v{p,Sjl)  =  v{p,r',i.)  for  some  processor  p.  As 
a  consequence  of  Lemmas  3.11  and  3.13,  the  values  of  F(5,^)  and  V{T',t) 
depend  only  on  the  local  state  of  p  at  (s,/)  and  respectively.  Since 

p  has  the  same  local  state  at  {s,t)  and  at  (r',^),  we  have  ^(s,^)  =  V{r',l). 
Since  V{r,t)  =  ^(a,^),  it  follows  that  V{r,t)  =  V{r',l).  □ 

One  consequence  of  Lemma  3.14,  together  with  Lemma  3.5  and  the  def¬ 
inition  of  V  above,  is  that  if  (r,^)  ~  (r',/),  then  Go{r,k)  -  Go{r',k)-  We 
will  find  this  a  useful  fact  when  proving  the  converse  of  Lemma  3.14;  that 
is,  that  all  points  with  the  same  V  are  similar,  and  hence  that  V  completely 
characterizes  the  connected  components  of  the  similarity  graph.  Before  we 
do  so,  however,  let  us  formalize  the  intuition  that  led  us  to  the  construction 
in  the  first  place  (the  use  of  the  two  basic  steps  in  the  construction  given  by 
Lemmas  3.9  and  3.10). 

A  A  A 

Lemma  3.15:  Let  r  be  a  run,  and  let  G,  k,  and  B  be  the  results  of  the 
construction  from  (r,/).  Let  r'  be  the  run  differing  from  r  only  in  that 
processors  in  G  do  not  fail  in  r'  and  processors  in  B  are  silent  from  time  k 
in  r'.  Then  {r,t)  ~  (/,/). 

Proof:  Let  Gi,  ki,  and  Bi  be  the  intermediate  results  of  the  construction 
from  {r,t)  starting  with  the  nonfaulty  processor  pj.  For  t  >  0,  define  Vi  to 
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be  the  run  differing  from  the  run  r  only  in  that  processors  in  Bi  are  silent 
from  time  ki  in  Ti  and  the  remaining  processors  do  not  fail  in  rj.  Notice 
that  r'  =  Ti  for  sufficiently  large  i.  We  proceed  by  induction  on  i  to  show 
that  (r,£)  ~  (r,)£)  for  all  i  >  0.  Suppose  t  =  0.  Since  the  subgraph  Qj{r,t) 
must  be  independent  of  whether  the  graph  Q{r,  i)  is  missing  an  edge  from  a 
processor  in  P— 5o  to  a  processor  other  than  Pj,  we  have^j(r,^)  =  Gjivo,  ko)- 
Since  processor  pj  is  nonfaulty,  it  follows  that  (r,  £)  ~  (ro,  /).  Suppose  i  > 
0  and  the  inductive  hypothesis  holds  for  i  —  1;  that  is,  (r,/)  ~ 

Lemma  3.9  implies  ~  W-i.  £)  where  r|_i  differs  from  rj-i  in  that 

processors  in  Bi-i  (the  processors  failing  in  ri_i)  are  silent  from  time  ki 
in  r|_i.  Lemma  3.10  implies  ~  (r,-,/).  Thus,  (r,£)  ~  (r,-,£).  □ 

Finally,  we  have  the  following: 

Lemma  3.16:  If  V(r,£}  =  V(t',£)  then  (r,/)  ~  (r\£). 

Proof:  The  fact  V(r,£)  =  V(r',£)  implies  G(r,£)  =  G(r\£),  k{r,£)  = 

k{r',£)y  and  B{r,£)  =  B{r*,£).  We  therefore  denote  these  values  by  G,  fc, 
and  B.  Let  s  be  a  run  that  differs  from  r  in  that  processors  in  G  do  not 

A  A 

fail  in  s,  and  processors  in  B  are  silent  from  time  k  in  s.  Let  s'  be  an 
analogous  run  with  respect  to  r'.  Lemma  3.16  implies  that  (r,^)  (a,/)  and 

(r',^)  ~  {s',£).  In  order  to  show  that  (r,/)  ~  {t',£),  it  is  enough  to  show 
that  (s, ~  (s', £).  Suppose  G  =  {qfi, . . . ,  and  let  s,-  be  the  run  differing 
from  s  in  that  qi,...,qi  receive  the  same  input  eifter  time  A  in  Si  as  they  do 
in  s'.  We  proceed  by  induction  on  i  to  show  that  (s,/)  ~  {3i,£)  for  all  t  >  0. 
Since  s  =  sq,  the  case  of  i  =  0  is  trivial.  Suppose  i  >  0  and  the  inductive 
hypothesis  holds  for  z  —  1;  that  is,  (a,^)  ~  (s;_i,^).  Let  «,•_!  and  «,•  be  runs 
differing  from  s,_i  and  s;,  respectively,  only  in  that  g<  is  silent  from  time  k 
in  «f_i  and  Lemma  3.8  implies  (s,_i,^)  ~  (u,-_i,  £)  and  (si,£)  ~  (u»,^)-  Ip 
addition,  since  xii_i  and  Ui  differ  oidy  in  the  input  received  by  qt  after  time  k, 
and  since  qi  is  silent  from  time  k  in  both  runs,  we  have  (ui_i,/)  ~  (u»>^)- 
Thus,  {a,£)  ~  (sj,/)  for  all  i  >  0.  In  particular,  (s,£)  ~  («m>^)*  In  order 
to  complete  the  proof,  it  now  suffices  to  show  that  (sm,£)  ~  (a',^).  Since 
^(5(r,k)  =  G,i(r',k),  (r,£)  ~  (s,^),  and  (r',£)  ~  (s',  /),  Lemma  3.14  implies 
that  G^(s,k)  =  G($(y,k).  Notice  that  G(s(im,k)  =  ^,5(5,  ft)  =  Gis(s',k). 
Notice,  in  addition,  that  processors  in  G  do  not  fail  in  either  s_  or  s',  and  that 
the  remaining  processors  (in  B)  are  silent  from  time  ft  in  both  runs.  Finally, 
notice  that  processors  in  G  receive  the  same  input  after  time  ft  in  both  runs. 
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It  follows  that  and  hence  that  Thus, 

/),  as  desired.  □ 

Combining  Lemmas  3.14  and  3.16  we  see  that  (r,^)  {r',t)  iff  V(r,^)  = 

We  therefore  have: 

Theorem  3.17:  (r,/)  |=  C^tp  iff  {r',£)  |=  y?  for  all  r’  satisfying  F(r,^)  = 

V{r',t). 

Consequently,  our  local  construction  completely  characterizes  the  connected 
components  of  the  similarity  graph,  and  hence  when  facts  become  common 
knowledge. 


A  Test  for  Common  Knowledge 


We  now  consider  how  this  chsiracterization  gives  rise  to  a  test  for  common 
knowledge  that  processors  can  compute  locally. 

From  Theorem  3.17,  it  follows  that  V(r,^)  in  a  precise  sense  summarizes 
and  uniquely  determines  the  set  of  facts  that  are  common  knowledge  at  any 
given  point  (r,/).  The  identity  of  V  has  two  components:  the  failure  pattern 
and  input  pattern  during  some  prefix  of  the  run.  The  fact  that  V  becomes 
common  knowledge  implies  that  certain  information  about  the  failure  pattern 
must  become  common  knowledge.  While  it  is  the  failure  pattern  alone  that 
determines  what  views  are  contained  in  V,  it  is  difficult  to  characterize  what 
properties  of  the  failure  pattern  lead  to  these  views  being  chosen  by  the 
construction,  and  hence  what  kinds  of  facts  about  the  failure  pattern  become 
common  knowledge.  On  the  other  hand,  information  about  the  input  that 
follows  from  the  views  in  V  does  characterize  in  a  crisp  way  what  facts  about 
the  input  are  common  knowledge.  Furthermore,  it  is  easy  to  deduce  from 
V'  whether  the  existence  of  a  failure  is  common  knowledge.  As  the  following 
corollary  will  show.  Theorem  3.17  implies  that  facts  about  the  input  and 
existence  of  failures  that  are  common  knowledge  at  the  point  (r,/)  must 
follow  directly  from  the  set  V{r,t).  We  now  make  this  statement  precise.  A 
run  r,  a  set  of  processors  G,  and  a  time  k  determine  a  joint  view  V  =  rG{k). 
We  denote  by  “V”  the  property  of  being  a  run  in  which  the  processors  in  G 
have  the  joint  view  V  at  time  k  (notice  that  G  and  k  are  uniquely  determined 


'  »»  ^  J  T  *  ft  / 1  \  If  mi-  •  i*i* 

oy  v  }.  in  oiner  woras,  (r, p  /  ui  —  V  •  Thus,  if  V  D  <p  is  valid  in 


the  system,  then  every  run  r'  satisfying  rQ{k)  =  V  must  also  satisfy  (p.  We 


now  have: 
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Corollary  3.18:  Let  ^  be  a  fact  about  the  input  and  the  existence  of  fail¬ 
ures,  and  let  V  =  V{r,t).  Then  (r,^)  j=  C^/tp  iff  V  D  (p  is  valid  in  the 
system. 


Proof:  Suppose  V  D  (p  is  valid  in  the  system.  By  Lemma  3.14,  we 

have  V{r,l)  =  V{r',l)  for  all  runs  r'  such  that  (r,^)  ~  (r',  i),  and  hence  that 
(r',  1=  F  for  all  such  r'.  Given  that  V  D  ^  is  valid  in  the  system,  we  have 

(r',^)  (=  ^  for  all  such  r'.  It  follows  that  (r,/)  |=  Cj/ip. 

For  the  other  direction,  suppose  that  V  D  (p  is  not  valid  in  the  system. 
Since  V  D  ^  is  not  valid  in  the  system,  let  u  be  a  run  such  that  (u,  /)  [=  V 
and  yet  (u,/)  ^  tp.  We  will  construct  a  run  a  such  that  (r,f)  ~  (s,f),  a  and  u 
have  the  same  input,  and  a  and  u  are  the  same  with  respect  to  the  existence 
of  failures  (i.e.,  a  will  be  failure-free  iff  u  is).  Since  ip  is  b.  fact  about  the 
input  and  the  existence  of  failures,  {u,i)  ^  (p  will  imply  (s,f)  ^  (p.  Since,  in 
addition,  (r,f)  (s,f),  we  will  have  that  {r,t)  ^  Cj^(p. 

We  construct  a  in  two  steps.  We  first  construct  a  run  v  with  the  input  of  u 
satisfying  (r,f)  ~  {v,l).  Let  v  be  the  run  with  the  failure  pattern  of  r  and 
the  input  of  u.  Given  that  r  and  v  have  the  same  failure  pattern,  and  that  G 
and  k  depend  only  on  the  failure  pattern,  we  have  that  G(r,/)  =  G{v,t)  and 
k{r,l)  =  k{v^l).  Let  us  denote  these  values  by  G  and  k.  Since  {u,l)  |=  V, 
we  have  v{G,  r,  k)  =  v{G,  u,  k),  and  hence  Q^{r,  k)  =  k).  Since  v  and  r 

have  the  same  failure  pattern,  the  unlabeled  graphs  underlying  G^iv,  k)  and 
G^ir,  k)  (and  hence  also  G^iV’i  k))  are  the  same.  Furthermore,  since  v  and  u 
have  the  same  input,  it  follows  that  k)  and  G^(u,  k)  (and  hence  also 
G^{r,k))  are  equal.  Since  G^{T,k)  =  Gd{v,k)  implies  V{r,l)  =  V{v,l),  we 
have  {r,l)  ~  (u,f)  by  Lemma  3.16. 

We  now  consider  the  existence  of  failures,  and  construct  the  desired  run  a. 
If  there  is  a  failure  in  u,  then  let  a  be  a  run  differing  from  v  only  in  that  a 
processor  fails  after  time  t  in  a.  Clearly  {v,t)  ~  and  hence  (r,^)  ~ 
(a,  £).  Conversely,  if  u  is  failure-free,  then  let  a  =  u.  Since  u  is  failure-free, 

A  A 

no  processor  in  G  knows  of  a  failure  at  time  k  in  u.  Since  processors  in  G 
have  the  same  local  state  at  time  k  in  both  u  and  r,  the  same  is  true  of  r. 

AAA  A  A 

It  follows  that  B  =  B{G,r,k)  is  empty,  and  since  G  =  P  —  B,  we  have 
that  G  =  F.  Notice  that  a  differs  from  v  only  in  that  processors  in  0  =  F 
do  not  fail  in  a,  and  hence  that  {v,i)  ~  (a,/)  by  Lemma  3.15.  Therefore, 
(r,i)  ~  (a,/).  In  either  case,  {r,t)  ~  {a,£),  a  and  u  have  the  same  input. 
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and  are  the  same  with  respect  to  the  existence  of  failures.  It  follows  by  the 
above  discussion  that  (r,  ^  Cf/(p.  □ 

Corollary  3.18  summarizes  the  sense  in  which  the  construction  allows  us 
to  test  whether  relevant  facts  are  common  knowledge  at  a  given  point.  Let 
us  consider  the  computational  complexity  of  performing  such  tests.  The 
first  step  in  applying  Corollary  3.18  to  determine  whether  a  fact  is  common 
knowledge  at  (r,/)  is  to  construct  K(r,^).  Recall  that  a  group  of  proces¬ 
sors  implicitly  knows  that  a  processor  is  faulty  iff  it  k^ows  of  a  message  the 
processor  failed  to  send.  This  is  an  easy  fact  to  check  given  the  communi¬ 
cation  graph  corresponding  to  the  group’s  view.  It  follows  that  computing 
every  iteration  of  the  construction  can  easily  be  done  in  polynomial  time. 
Furthermore,  since  the  construction  is  guaranteed  to  converge  within  t  -t- 1 
iterations,  it  follows  that  G  and  k,  and  hence  also  V  can  be  computed  locally 
in  polynomial  time  (as  long  as  V  is  of  polynomial  size).  Recall  that  if  is 
a  practical  fact,  then  it  is  possible  to  determine  in  polynomial  time  whether 
or  not  y  D  yj  is  valid  in  the  system.  Thus,  given  a  practical  simultaneous 
choice  problem  C,  one  polynomial-time  implementation  of  a  test  for  com¬ 
mon  knowledge  of  enabled{ai)  is  to  construct  the  set  F  =  V"  and  determine 
whether  V  D  enabled{ct)  is  valid  in  the  system.  As  a  result,  Theorem  3.4 
implies  the  following: 


Theorem  3.19:  If  C  is  an  implementable,  practical  simulteineous  choice, 
then  there  is  a  polynomial-time  optimal  protocol  for  C. 


Discussion 


We  reiterate  the  fact  that  the  resulting  protocol  for  C  is  optimal  in  all  runs: 
actions  are  performed  in  runs  of  as  soon  as  they  could  possibly  be  per¬ 
formed  in  runs  of  any  other  protocol,  given  the  operating  environment  of  the 
run.  Thus,  for  example,  simultaneous  Byzantine  agreement  is  performed  in 
anywhere  between  2  and  t  +  1  rounds,  depending  on  the  pattern  of  failiures 
(as  is  shown  in  [DM86]  to  be  the  case  in  the  crash  failure  model).  Similarly, 
the  firing  squad  problem  can  be  performed  in  anywhere  between  1  and  t+ 1 


rounds  after  a  “start”  signal  is  received. 


X  XU.  ckix  uxi.v«0C  CiISww) 


the  simultaneous  actions  can  be  performed  quickly  only  when  many  failures 
become  known  to  the  nonfaulty  processors.  In  particular,  if  there  are  no 
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failures,  no  fact  about  the  input  is  common  knowledge  less  than  t  + 1  rounds 
after  it  is  first  determined  to  hold. 

Recall  that  every  processor,  faulty  or  nonfaulty,  is  able  to  compute  the 
set  V{t,1)  locally.  As  a  result,  the  following  proposition  shows  that  a  fact  is 
common  knowledge  to  the  nonfaulty  processors  iff  it  is  common  knowledge 
to  all  processors. 

Proposition  3.20:  Let  ip  be  an  arbitrary  fact.  In  the  omissions  model, 
C^ip  =  Cp(p  is  valid  in  all  systems  running  a  full-information  protocol. 

Proof:  By  Theorem  2.3,  it  is  enough  to  show  that  (r,^)  ^  iff  (r,f)  ~ 
(r',  for  all  runs  r  and  r'  and  times  £.  The  ‘if’  direction  is  trivial,  since  AT  C 
P.  The  proof  of  the  other  direction  is  identical  to  the  proof  of  Lemma  3.14, 
interpreting  ~  as  ~.  □ 

.Proposition  3.20  implies  that  all  processors  (even  the  faulty  processors) 
know  exactly  what  actions  are  commonly  known  to  be  enabled  in  runs  of  !Fc- 
Thus,  in  this  model  the  protocol  Tc  is  guaranteed  to  satisfy  a  stronger  version 
of  simidtaneous  choice  problems,  in  which  confUtion  (ii)  is  replaced  by 

(iP)  if  <n  is  performed  by  any  processor  (faulty  or  nonfaulty),  then  it  is 
performed  by  all  processors  simultaneously. 

Furthermore,  since  when  an  action  is  performed  it  is  performed  simultane¬ 
ously  by  all  processors,  and  since  no  other  action  is  ever  performed,  there  is 
no  need  for  processors  to  continue  sending  messages  after  performing  actions 
in  runs  of  !Fc  in  this  model.  We  can  therefore  further  reduce  the  communica¬ 
tion  of  !Fc  hy  having  processors  halt  after  performing  a  simultaneous  action. 
As  a  result,  the  following  is  an  optimal  protocol  for  any  implementable  simul¬ 
taneous  choice  problem  C,  an  optimal  protocol  simpler  than  the  protocol 

repeat  every  round 

send  current  local  state  to  every  processor 
until  C//enab!ed(ai)  holds  for  some  a,-; 
j  min{i  :  C^enahhdiai)  holds}; 
perform  Uj; 
halt. 
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The  fact  that  in  the  omissions  model  the  information  in  V{r,t)  is  essen¬ 
tially  all  that  is  common  knowledge  at  a  given  point  has  interesting  implica¬ 
tions  about  the  type  of  simultaneous  actions  that  can  be  performed  in  this 
model.  For  example,  recall  that  in  the  traditional  simultaneous  Byzantine 
agreement  or  consensus  problems  (see  [PSL80,  Fis83,  DM86]),  the  processors 
are  only  required  to  decide,  say,  v  in  case  they  aU  start  with  an  initial  value 
of  V.  A  stronger  (and  arguably  more  natural,  or  at  least  democratic)  require¬ 
ment,  however,  would  require  they  decide  v  whenever  the  majority  of  initial 
vsilues  are  v.  This  is  clearly  impossible,  since  some  processors  may  be  silent 
throughout  the  run.  However,  consider  a  protocol  for  simultaneous  Byzan¬ 
tine  agreement  which  is  similar  to  Tct  except  that  when  some  enabled{ai) 
becomes  common  knowledge  (which  happens  exactly  when  V  becomes  non¬ 
empty),  the  processors  choose  the  value  that  appears  in  the  majority  of  the 
initial  values  recorded  in  F(r,^)  as  their  decision  value.  In  this  case,  the 
processors  actually  approximate  majority  fairly  well:  If  more  than  (n  -f  t)/2 
of  the  initial  values  are  u,  then  v  will  be  chosen.  In  fact,  we  can  show  that 
the  approximation  is  bad  only  in  runs  in  which  agreement  is  obtained  early. 
In  particular,  if  agreement  cannot  be  obtained  before  time  t  -f  1  (this  woiild 
happen  in  runs  r  for  which  ^(r,^)  contains  only  empty  local  states  for  every 
’  <  t),  then  the  value  agreed  upon  would  be  the  majority  value  in  case  more 
than  (n  +  l)/‘2  of  the  processors  have  the  same  initial  value.  Furthermore,  a 
protocol  for  weak  (exact)  majority  does  exist:  A  protocol  that  either  decides 
that  there  was  a  failure  or  decides  on  the  true  majority  value. 

Since  messages  from  faxilty  processors  can  convey  new  information  about 
the  failure  pattern,  such  messages  do  affect  the  construction.  Therefore,  the 
behavior  of  faulty  processors,  even  after  they  have  been  discovered  to  be 
faulty,  plays  an  import^mt  role  in  determining  what  facts  become  common 
knowledge  and  when.  In  the  crash  failure  model,  however,  a  failed  processor 
does  not  communicate  with  other  processors  after  its  failing  round  and  has 
little  impact  on  what  facts  become  common  knowledge.  This  is  an  essen¬ 
tial  property  of  the  ::;missions  model  distinguishing  it  &om  the  crash  failure 
model. 

We  note,  however,  that  all  of  the  analysis  in  this  subsection  applies  to  the 
crash  failure  model,  with  all  of  the  proofs  applying  verbatim  when  restricted 
to  the  crash  failure  model.  We  thus  have: 

Proposition  3.21:  In  the  crash  failure  model,  (r,^)  ]=  G//ip  iff  it  is  the  case 
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that  1=  (p  for  all  r'  satisfying  y(r(^)  =  V(r',i). 

Thus,  the  set  V(r,i)  completely  characterizes  what  facts  are  common  knowl¬ 
edge  at  the  point  (r,i)  in  the  crash  failure  model  as  well.  Since  the  same 
proofs  show  that  the  construction  characterizes  the  connected  components  of 
the  similarity  graph  in  both  the  omissions  and  the  crash  failure  model,  the 
similarity  graph  in  the  omissions  model  is  simply  an  extension  of  the  simi¬ 
larity  graph  in  the  crash  failure  model  obtained  by  adding  nodes  and  edges 
to  the  similarity  graph  in  the  crash  failure  model,  not  breaking  up  the  con¬ 
nected  components  appearing  in  the  crash  failure  model.  This  implies  that 
in  a  run  of  the  omission  model  having  a  failure  pattern  consistent  with  the 
crash  failure  model,  exactly  the  same  facts  about  the  input  and  the  existence 
of  failures  are  common  knowledge  at  any  given  time  in  both  the  crash  failure 
and  the  omissions  model.  (However,  as  a  result  of  the  difference  in  the  types 
of  failures  possible  in  the  two  failure  models,  different  facts  about  the  failure 
pattern  are  common  knowledge  at  the  corresponding  points.)  Ruben  Michel 
has  independently  characterized  the  similarity  graph  in  variants  of  the  crash 
failure  model  (see  (Mic88]).  For  the  crash  failure  model  itself,  he  has  an  al¬ 
ternative  construction  that  also  characterizes  the  connected  components  of 
the  similarity  graph. 

As  in  the  omissions  model,  it  follows  from  Proposition  3.21  that  our 
construction  can  be  used  to  derive  efficient  optimal  protocols  for  simultaneous 
choice  problems  in  the  crash  failure  model,  thus  showing  that  results  sinailar 
to  those  proven  in  [DM86]  in  the  crash  model  can  be  obtriined  in  the  omissions 
model,  although  our  techniques  are  quite  different.  We  therefore  have  the 
following: 

Corollary  3.22:  Let  C  be  an  implementable,  practical  simultaneous  choice. 
In  the  crash  failure  model,  there  is  a  polynomial-time  optimal  protocol  for  C. 

As  a  final  remark,  let  ki  and  Gi  be  the  intermediate  results  of  beginning 
the  construction  at  the  point  (r,£),  and  denote  v(Gi,r,ki)  by  Vi.  Consider 
the  operator  £  defined  by  £{Vi)  =  Vi+r  for  all  t.  We  find  it  interesting  that  V, 
which  is  a  fixed  point  of  the  operator  £,  characterizes  the  facts  <p  for  which 
C//<p  holds,  where  we  know  from  [HM84]  that  Cf/ip  is  a  fixed  point  of 
(see  Proposition  2.6).  While  researchers  are  used  to  thinking  sememtically 
of  common  knowledge  as  a  fixed  point,  this  construction  shows  how  we  can 
think  combinatorially  of  common  knowledge  as  a  fixed  point,  as  well. 
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3.5.2  Receiving  Omissions 

In  the  omissions  model,  faulty  processors  fail  only  to  send  messages.  In  this 
subsection,  we  consider  the  symmetric  receiving  omissions  model,  in  which 
faulty  processors  fail  only  to  receive  messages.  While  at  iirst  glance  these 
models  seem  quite  similar,  they  are  actually  extremely  different.  In  partic¬ 
ular,  we  will  see  that  testing  for  common  knowledge  in  this  model  becomes 
trivial.  As  a  result,  efficient  optimal  protocols  for  practical  simultaneous 
choice  problems  become  completely  trivi2d  in  this  model. 

One  intriguing  difference  between  the  omissions  model  and  the  receiving 
omissions  model  is  the  following.  We  have  seen  in  the  omissions  model  that  in 
some  cases  a  fact  (for  example,  the  arrival  of  a  “start”  signal)  does  not  become 
common  knowledge  until  as  many  as  1 4- 1  rounds  after  it  is  first  determined 
to  hold.  Intuitively,  the  attainment  of  common  knowledge  is  delayed  by  the 
possibility  that  a  processor  might  fail  to  send  a  message  determining  that  the 
fact  holds.  However,  in  the  receiving  omission  model  even  faulty  processors 
send  all  messages  required  by  the  protocol.  Since  nonfaulty  processors  receive 
all  messages  sent  to  them,  in  runs  of  a  full-information  protocol  all  nonfaulty 
processors  have  a  complete  view  of  the  first  k  rounds  at  time  k  +  1.  We  can 
thus  show  the  following: 


Theorem  3.23:  Let  ^  be  a  fact  about  the  first  k  rounds.  In  the  receiving 
omissions  model,  (r,  A)  |=  y?  iff  (r,  fc  1)  [=  C//<p. 


The  proof  of  this  result  depends  on  the  notion  of  a  fact  being  valid  at  time  ie: 
A  fact  (f  is  said  to  be  valid  (in  the  system)  at  time  k  if  for  all  runs  r  we  have 
(r,  k)  1=  (p.  We  remark  that  the  following  variant  of  the  induction  rule  holds: 


If  (pD  Esfp  is  valid  at  time  k, 

then  (p  D  Cs<p  is  valid  at  time  k. 


Proof:  Since  (pis  a  fact  about  the  first  k  rounds,  (r,  fc)  [=;  y?  iff  (r,k  +  l)h 
(p.  Thus,  it  is  enough  to  show  that  (r,  fc  -[- 1)  1=  <p  iff  (r,  fc  -f  1)  [=  Cu<P- 
Notice  that  {r,k-\- 1)  [=  C//(p  implies  {r,k  +  1)  \=  (p.  Conversely,  suppose 
(r, k+1)  1=  During  round  A:  -f  1  in  r  every  processor  sends  its  entire 
local  state  to  all  processors,  so  at  time  A:  4- 1  all  nonfaulty  processors  have  a 
complete  view  of  the  first  k  rounds  of  r.  Since  is  a  fact  about  the  first  h 
rounds,  (r,  fc  4-  1)  [=  Ef/(p.  We  have  just  shown  that  (p  D  E//(p  is  vab'd  at 
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time  A  +  1,  so  yj  D  Cf/tp  is  valid  at  time  +  1  as  well.  Thus,  (r,  fc  +  1)  |=  ^ 
implies  (r,  /!+!)[=  Cj/(p.  □ 

As  a  consequence  of  Theorem  3.23,  polynomial-time  optimal  protocols  for 
practical  simultaneous  choice  problems  are  very  simple  in  this  model.  Again, 
by  polynomial-time  here  we  will  mean  polynomial  in  n,  t,  and  the  round 
number  t. 

Corollary  3.24:  Let  C  be  an  implementable,  practical  simultaneous  choice. 
In  the  receiving  omissions  model,  there  is  a  polynomial-time  optimal  protocol 
for  C. 

Proof:  Since  C  is  implementable.  Theorem  3.4  implies  that  J^c  is  an  optimal 
protocol  for  C.  It  remains  to  show  that  J^c  can  be  implemented  in  polynomial 
time.  Since  the  messages  sent  by  can  clearly  be  computed  in  polynomial 
time,  we  need  only  show  how  to  implement  the  tests  for  conunon  knowledge 
of  the  conditions  enabled{ai)  in  polynomial  time.  We  claim  that  (r,f)  |= 
C//enabled{ai)  iff  0{r,i  —  1)  D  enabled{ai)  is  valid  in  the  system.  Since  C 
is  a  practical  simultaneous  choice  problem,  determining  whether  Q{r,t  — 
1)  D  enabled{ai)  is  valid  in  the  system  can  be  done  in  polynomial  time. 
As  Q(r,t  —  1)  can  be  determined  by  all  nonfaulty  processors  at  (r,f)  in 
polynomied-time,  this  will  yield  a  polynomial-time  implementation  of  a  test 
for  common  knowledge  of  enabled{ai),  and  we  will  be  done.  Suppose  Q{r,t  — 
1)  D  enabled{ai)  is  valid  in  the  system.  Theorem  3.23  implies  that  Q{r,i—  1) 
is  common  knowledge  at  (r,f),  and  it  follows  that  (r,f)  |=  C f/enabled{ai). 
Conversely,  suppose  (r,f)  |=  C//enabled{ai).  Let  a  be  a  run  satisfying  Q{r,l— 
1).  A  proof  simile  to  the  base  case  of  Lemma  3.8  shows  that  (r,f)  ~  (5,^). 
Since  (r,f)  [=  Cj/cnable^ai),  it  follows  that  (s,/)  |=  enabled{ai).  Thus, 
Q{r, i—l)D  enabled{ai)  is  valid  in  the  system,  as  desired.  □ 

The  results  of  this  section  point  out  a  number  of  interesting  diiferences 
between  the  omissions  model  and  the  receiving  omissions  model.  For  ex¬ 
ample,  consider  the  distributed  firing  squad  problem.  First,  Theorem  3.23 
implies  that  all  nonfaulty  processors  are  able  to  hie  in  the  receiving  omission 
model  exactly  one  round  after  the  first  "start”  signal  is  received.  Recall  that 
in  the  omissions  model,  firing  may  delayed  as  many  as  t  -fl  rounds.  Second, 
since  a  faulty  processor  p  might  fail  to  receive  all  messages,  it  is  not  possible 
to  guarantee  that  p  will  ever  fire  following  the  receipt  of  a  “start”  signal 
by  a  nonfaulty  processor.  In  the  omissions  model  we  have  shown  that  it  is 
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possible  to  guarantee  that  all  processors  perform  any  action  (e.g.,  “firing”) 
performed  by  the  nonfaulty  processors.  Finally,  notice  that  faulty  processors 
may  sometimes  be  unable  to  halt,  or  terminate  their  participation  in  a  dis¬ 
tributed  firing  squad  protocol,  even  long  after  the  nonfaulty  processors  have 
fired:  A  processor  p  receiving  no  messages  or  “start”  signals  at  all  can  never 
halt  since  at  any  point  it  is  possible  (according  to  p’s  local  state)  that  it  will 
be  the  only  processor  in  the  system  to  receive  a  “start”  signal.  In  this  case, 
optimal  protocols  must  require  the  nonfaulty  processors  to  fire  one  round 
later,  and  hence  p  must  be  able  to  send  this  information  to  the  nonfaulty 
processors.  In  contrast,  in  the  omissions  model  it  is  possible  to  guarantee 
that  all  processors  halt  as  soon  as  an  action  is  performed  in  the  system. 
These  remarks  show  that  while  at  first  glance  the  assignment  of  responsibil¬ 
ity  for  undelivered  messages  to  sending  or  to  receiving  processors  may  seem 
arbitrary,  the  assignment  has  a  dramatic  effect  on  when  facts  become  com¬ 
mon  knowledge,  and  hence  on  the  behavior  of  optimal  protocols.  Since  such 
a  simple  modification  of  the  omissions  model  results  in  the  collapse  of  the 
combinatorial  structure  underlying  the  model  (witness  Theorem  3.23),  we 
consider  this  to  be  an  indication  that  the  omissions  model  is  not  a  robust 
model  of  failure. 

3.5.3  Generalized  Omissions 

We  have  just  seen  that  the  choice  of  whether  sending  or  receiving  proces¬ 
sors  are  responsible  for  undelivered  messages  has  a  dramatic  effect  on  the 
structure  of  the  omissions  model.  Consider,  however,  the  generalized  omis¬ 
sions  model,  in  which  a  faulty  processor  may  fail  both  to  send  and  to  receive 
messages.  This  section  is  concerned  with  the  design  of  optimal  protocols 
for  simultaneous  choice  problems  in  this  model.  We  have  seen  that  Theo¬ 
rem  3.4  implies  the  protocol  !Fc  is  an  optimal  protocol  in  this  model,  and 
that  Theorem  3.7  implies  this  protocol  can  be  implemented  in  polynomial- 
space.  As  in  previous  sections,  the  remaining  question  is  whether  there  are 
efficient  optimal  protocols  in  this  model.  The  principal  result  of  this  section 
is  that  testing  for  common  knowledge  in  the  generalized  omissions  model  is 
NP-hard.  Using  the  close  relationship  between  common  knowledge  and  si¬ 
multaneous  actions,  we  obtain  as  a  corollary  that  optimal  piotocols  for  most 
any  simultaneous  choice  problem  in  this  model  require  processors  to  perform 
NP-hard  computations.  Consequently,  for  example,  in  this  model  there  can 
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be  no  efficient  optimal  protocol  for  simultaneous  Byzantine  agreement  or  the 
distributed  firing  squad  problem.  This  is  a  dramatic  difference  between  the 
generalized  omissions  model  and  the  more  benign  failure  models,  where,  as 
we  have  seen,  efficient  optimal  protocols  do  exist. 

One  important  difference  between  the  generalized  omissions  model  and 
simpler  variants  of  the  omissions  model  is  that  in  the  generalized  omissions 
model  undelivered  messages  do  not  necessarily  identify  the  set  of  faulty  pro¬ 
cessors,  but  merely  place  constraints  on  their  possible  identities:  Either  the 
sender  or  the  intended  receiver  of  every  undelivered  message  must  be  faulty. 
The  faulty  processors  must  therefore  induce  a  “vertex  cover”  of  the  unde¬ 
livered  messages.  Recall  that  in  our  analysis  of  the  omissions  failure  model, 
determining  the  number  and  the  identity  of  the  faulty  processors  given  the 
labeled  communication  graph  of  a  point  played  a  crucial  role  in  characterizing 
the  facts  that  are  common  knowledge  at  a  point.  In  that  model,  a  processor 
is  known  to  be  faulty  iff  it  is  known  that  a  message  it  was  supposed  to  send 
was  not  delivered,  a  fact  easily  determined  hrom  the  labeled  communication 
graph.  In  the  generalized  omissions  model,  however,  even  determining  the 
number  (and  not  necessarily  the  identities)  of  processors  implicitly  known 
to  be  faulty  essentially  involves  computing  the  size  of  the  minimal  vertex 
cover  of  a  graph,  a  problem  known  to  be  NP-complete  (see  [GJ79]).  It  is 
with  this  intuition  that  we  now  proceed  to  show  that  determining  whether 
certain  facts  are  common  knowledge  is  computationally  prohibitive  in  the 
generalized  omissions  model,  assuming  P^NP. 

However,  in  order  to  study  the  complexity  of  testing  for  common  knowl¬ 
edge  in  the  generalized  omissions  model  in  a  meaningful  way,  we  are  once 
again  faced  with  the  need  to  restrict  our  attention  to  a  cleiss  of  facts  that 
includes  all  of  the  facts  that  may  arise  in  natural  simultaneous  choice  prob¬ 
lems,  and  excludes  anomalous  cases.  For  example,  if  (p  is  valid  in  the  system, 
then  so  is  C//ip,  and  testing  whether  (p  is  common  knowledge  is  a  trivial  task. 
On  the  other  hand,  one  can  imagine  facts  involving  excessive  computational 
complexity  of  a  type  irrelevant  to  simultaneous  choice  problems.  Consider, 
for  instance,  a  fact  tp  with  the  property  that  the  communication  graph  of  any 
point  satisfying  (p  encodes  information  allowing  the  solution  of  all  problems 
in  NP  of  size  smaller  than  the  number  of  processors  in  the  system.  Whereas 
it  seems  unlikely  that  such  a  fact  exists,  such  a  statement  is  probably  very 
h2Lrd  to  prove,  and  it  is  definitely  not  the  business  of  this  chapter  to  do  so. 

We  are  therefore  led  to  make  the  following  restriction.  A  fact  <p  is  said  to 
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be  admissible  within  a  class  of  systems  running  a  full-information  protocol  if 

(i)  for  all  systems  within  this  class  neither  tp  nor  -itp  is  valid  in  the  system, 
and  (ii)  there  is  a  polynomial-time  algorithm  explicitly  constructing  for  each 
system  a  labeled  communication  graph  U(r,i)  of  minimal  length  having  the 
property  that  Q(r,£)  D  <p  is  valid  in  the  system.  Condition  (i)  simply  says 
that  in  none  of  these  systems  is  testing  for  <p  completely  trivial.  Condition 

(ii)  says  that  in  each  of  these  systems  it  has  to  be  easy  to  generate  enough  of  a 
communication  graph  to  guarantee  that  p  is  true  at  any  point  of  any  run  with 
this  communication  graph.  The  ability  to  generate  such  a  graph  will  be  used 
to  generate  the  graph  we  submit  to  a  given  test  for  common  knowledge  of  <p. 
We  say  that  a  simultaneous  choice  problem  C  is  admissible  if  each  condition 
enablec^ai)  is  admissible  within  the  class  of  systems  determined  by  a  full- 
information  protocol  and  C.  We  claim  that  any  natural  simultaneous  choice 
is  admissible.  We  can  now  state  the  fundamental  result  of  this  section  which 
says,  loosely  speaking,  that  testing  for  common  knowledge  of  admissible  facts 
^1, . . .  is  NP-hajd. 

For  given  facts  y>i, . . . ,  <^6  (6  >  1)  and  a  class  S  =  {S{n,  t)  :  n  >  t  -f  2} 
of  systems,  define  the  decision  problem  of  testing  for  common  knowledge  of 
in  E  as  follows:  Given  as  input  a  graph  Qi{r,t)  corresponding 
to  Pi’s  local  state  at  a  point  {r,i)  of  a  system  in  S  with  n  >  2t,®  does 

Lemma  3.25:  Let  ^,, . . .  be  admissible,  practical  facts  within  a  class  L 
of  systems  running  a  full-information  protocol  in  the  generalized  omissions 
model.  The  problem  of  testing  for  common  knowledge  oi  ...  ,ipb  in  S  is 
NP-hard  (in  n). 


The  proof  of  Lemma  3.25  will  follow  shortly.  Notice,  however,  that  t  is 
variable  in  the  statement  of  this  lemma,  and  in  general  may  be  0(n).  The 
proof  of  this  result  will  not  apply  for  a  fixed  t,  nor  to  cases  in  which  t  is 
restricted,  say,  to  be  O(logn).  In  any  case,  it  will  follow  that  any  standard 
implementation  of  our  optimal  knowledge-bsised  protocols  must  be  compu¬ 
tationally  intractable,  unless  P=NP.  It  is  natural  to  ask  whether  this  ineflR- 
ciency  is  merely  the  result  of  having  programmed  our  protocols  using  tests 


"We  note  that  the  condition  n  ^  2i  seems  odd,  bui  this  slightly  stiouger  formulation  of 
testing  for  common  knowledge  is  needed  later  when  proving  the  intractability  of  optimal 
protocols  for  simultaneous  choice  problems  in  this  model. 
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for  common  knowledge.  It  is  conceivable,  for  instance,  that  there  are  opti¬ 
mal  protocols  for  admissible  simiiltaneous  choice  problems  in  the  generalized 
omissions  model  that  are  computationally  efficient.  Intuitively,  however,  in 
order  to  perform  a  simultaneous  action,  an  optimal  protocol  V  n.  :st  essen¬ 
tially  determine  whether  any  of  the  conditions  tnableS^ai)  is  comm  m  knowl¬ 
edge.  Corollary  3.3  implies  that  such  a  condition  becomes  common  knowledge 
during  the  corresponding  run  of  a  full-information  protocol  as  soon  as  it  does 
during  a  run  of  P.  Thus,  an  optimal  protocol  P  must  essentially  determine 
whether  such  a  fact  is  common  knowledge  during  the  corresponding  run  of  a 
full-information  protocol  p.  Since  Lemma  3.25  implies  that  this  problem  is 
NP-hard,  computing  the  function  P  must  be  NP-haxd  as  well.  We  now  make 
this  argument  precise. 

Recall  that  a  protocol  is  formally  a  function  mapping  n,  t,  and  a  proces¬ 
sor’s  state  to  a  list  of  the  actions  the  processor  should  perform,  followed  by  a 
list  of  the  messages  it  is  required  to  send  in  the  following  round.  We  say  that 
a  protocol  is  communication-efficient  if  in  a  system  of  n  processors  the  size  of 
the  messages  each  processor  is  required  to  send  during  round  I  is  polynomial 
in  n  and  L  In  the  following  result  we  show  that  the  problem  of  computing 
the  function  corresponding  to  a  communication-efficient  optimal  protocol  for 
a  simultaneous  choice  problem  is  NP-hard.  Hence,  no  such  protocol  can  be 
computationally  efficient,  unless  P=NP. 

For  a  given  protocol  P  and  class  S  =  {S(7i,  t)  :  n  >  t  -1-  2}  of  systems, 
define  the  problem  of  computing  P  in  E  as  follows:  Given  as  input  a  graph 
Qi{T,l)  corresponding  to  p.-’s  local  state  at  a  point  (r,/)  of  a  system  in  S, 
output  the  list  of  messages  p,-  is  required  by  P  to  send  at  (r,^),  and  output 
the  list  of  actions  p,-  is  required  by  P  to  perform  at  (r,  1). 

Theorem  3.26:  Let  C  be  an  admissible,  practical  simultaneous  choice  with 
actions  oi, . . . ,  o^,  and  let  P  be  a  communication-efficient,  optimal  protocol 
for  C.  Let  S  be  the  class  of  systems  determined  by  P  and  C.  There  is 
a  Turing  reduction  from  the  problem  of  testing  for  common  knowledge  of 
enabled(ai), ...,  enabled{ab)  in  S  to  the  problem  of  computing  P  in  E.  In 
this  sense,  the  problem  of  computing  7^  in  S  is  NP-hard  (in  n). 

Proof:  Notice  that  since  “P  is  a  protocol  for  C,  the  problem  C  must  be  im- 
plementable,  amd  Theorem  3.4  implies  that  the  full-information  protocol  Tc 
must  be  an  optimal  protocol  for  C.  Let  E  =  {S(ti,  t)  :  n  >  <  +  2}  be  the 
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class  of  systems  determined  by  C  and  Since  C  is  an  admissible,  practi¬ 
cal  simultaneous  choice,  each  condition  enabled{ai)  must  be  an  admissible, 
practical  fact  within  S,  By  Lemma  3.25,  given  the  graph  of  a  point 

(r,  in  a  system  S(n,  t)  with  n  >  2t,  the  problem  of  determining  whether 
(’•.<)  N  V  iC//enabled{ai)  is  NP-hard.  We  will  exhibit  a  Turing  reduction 
from  this  problem  to  the  problem  of  computing  V]  that  is,  given  the  graph 
G{r,  t)  of  a  point  (r,  in  a  system  S(n,  t)  with  n  >  2t,  we  will  show  how 
to  use  V  to  determine  in  polynomial  time  whether  {r,£)  \=  Vt  Cj^renabled{ai). 
Having  exhibited  such  a  reduction,  we  will  have  shown  that  the  problem  of 
computing  V  is  NP-hard. 

Let  r  be  a  run  of  Tc  in  a  system  B(n,  t)  with  n  >  2t,  and  let  s  be  the 
corresponding  run  of  V.  It  follows  from  the  definition  of  J^c  that  (r,^)  [= 
yiCu6nabled{ai)  iff  the  nonfaulty  processors  perform  a  simultaneous  action 
no  later  than  time  i  in  r.  Since  and  V  are  both  optimal  protocols  for  C, 
the  nonfaulty  processors  perform  simultaneous  actions  at  the  same  times 
during  r  and  s.  Since  n  >  2t,  there  must  be  at  least  t+1  nonfaulty  processors 
in  both  runs,  so  the  nonfaulty  processors  simultaneously  perform  an  action 
no  later  than  time  t  in  either  run  iff  t  -f  1  processors  do  so.  Therefore, 
(^»^)  N  \JiCuenabled{ai)  iff  t  -f  1  processors  perform  a  simultaneous  action 
no  later  than  time  I  in  3. 

One  algorithm  for  determining  whether  t  -f  1  processors  do  perform  a 
simultaneous  action  no  later  than  time  ^  in  a  is  to  construct  the  local  state  of 
each  processor  in  s  at  each  time  k  before  time  i,  and  use  V  to  determine  when 
processors  axe  required  to  perform  actions.  Suppose  we  have  constructed  the 
state  of  each  processor  at  time  ^  —  1  in  s;  let  us  consider  the  problem  of 
constructing  the  state  of  a  processor  p  at  time  k.  Processor  p’s  state  at  (s,  k) 
consists  of  p’s  name,  the  time  fc,  a  list  of  the  messages  received  by  p  during 
the  first  k  rounds  of  s,  and  a  list  of  the  input  received  bj'  p  during  the  first  k 
rounds  of  s.  Recall  that  since  r  is  a  run  of  a  full-information  protocol,  the 
graph  Gir^t)  is  actually  {in  encoding  of  the  operating  environment  during 
the  first  I  rounds  of  r,  and  hence  also  of  a.  Given  the  states  of  all  processors 
at  time  ^  —  1,  the  protocol  V  determines  what  message  each  processor  is 
required  to  send  to  p,  and  G{t,1)  determines  which  of  these  messages  are 
actually  delivered  to  p  in  a.  Since  V  is  communication-efficient,  each  of  these 
tnefisages  is  of  size  polynomial  in  n  and  k.  Furthermore,  the  input  received 
by  p  during  round  k  labels  the  node  (p,  k)  of  G{r,  1).  Since  C  is  practical,  this 
input  is  of  constant  size.  Thus,  given  each  processor’s  state  at  time  k  —  1, 
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we  can  use  the  graph  Q{ryt)  and  an  oracle  for  V  to  condtruct  the  state  of 
each  processor  at  time  A  of  s  in  polynomial  time.  (An  oracle  for  7^  is  an 
oracle  that,  given  the  state  of  a  processor  p  at  a  point  (r,/),  in  one  step 
determines  what  actions  V  requires  p  to  perform  at  time  and  constructs 
the  messages  V  requires  p  to  send  during  round  I  +  1.) 

Consider  the  following  algorithm: 

actionjperformed  <—  false; 

k  < —  0; 

repeat 

for  ail  processois  p  do 

determine  whether  V  requires  p  to  perform  any  action  at  time  k,  and 
construct  the  messages  V  requires  p  to  send  during  round  k  +  1; 

endfor 

if  <  +  1  processors  perform  actions  at  time  k 
then  action-performed-,  true; 
k  *-  k+l; 

until  k  >  I  or  action-performed; 

if  action-performed 
then  halt  with  “yes” 
else  halt  with  “no”. 

From  the  previous  discussion  it  is  clear  that  given  any  oracle  for  V,  this 
algorithm  determines  in  polynomial  time  whether  t  +  l  processors  perform 
actions  simultaneously  no  later  than  time  t  in  s,  and  hence  whether 
\/iCf/enahlcd{ai).  □ 

As  an  immediate  corollary  of  Theorem  3.26,  we  have  the  following: 

Corollary  3.27:  Let  C  be  an  admissible  practicid  simultaneous  choice  prob¬ 
lem.  If  there  is  a  polynomial-time  optimal  protocol  for  C,  then  P=NP. 

Corollary  3.27  implies  that  optimal  protocols  for  simultaneous  choice 
problems  as  simple  as  the  distributed  firing  squad  problem  or  simultane¬ 
ous  Byzantine  agreement  are  computationally  infeasible  in  the  generalized 
omissions  model,  assuming  P  ^  NP.  In  fact,  we  do  not  know  whether  these 
problems  can  be  implemented  in  polynomial  time  even  using  an  NP  oracle. 

The  best  we  can  do  in  the  generalized  omissions  model  is  implement  them 
using  polynomied-space  computations,  as  in  the  proof  of  Theorem  3.7.  We 
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consider  the  question  of  determining  the  exact  complexity  of  implementing 
admissible  practical  simultaneous  choice  problems  in  this  model  an  interest- 
ing  open  problem. 

We  now  proceed  to  prove  Lemma  3.25.  First,  however,  we  state  a  result 
that  will  be  very  useful  in  the  proof  of  Lemma  3.25.  Roughly  speaking,  it  says 
that  if  a  group  of  processors  can  (jointly)  prove  that  they  are  nonfaulty,  then 
their  states  become  common  knowledge  at  the  end  of  the  following  round. 

Lemma  3.28:  Let  5  be  a  set  of  processors  and  let  5  =  P  —  S.  Let  r  be 
a  run  of  a  full-information  protocol.  If  the  processors  in  S  implicitly  know 
at  (r,  ^  —  1)  that  S  contains  t  faulty  processors,  then  the  joint  view  of  S  at 
(r,^  —  1)  is  common  knowledge  at  (r,/). 

Proof:  Let  (p  =  “V  is  the  joint  view  of  S  at  time  t  —  1”,  where  V  = 
v(S,r,i  —  1).  Suppose  |=  tp.  Given  that  S  has  the  same  joint  view 
at  (r,f  —  1)  and  at  (r',£  —  1),  and  since  S  implicitly  knows  at  (r,£  —  1) 
that  S  contains  t  faulty  processors,  S  implicitly  knows  the  same  at  (r',  £  —  1). 
In  particular,  the  processors  in  S  must  be  nonfaulty  in  r',  and  each  must 
successftilly  send  its  state  to  all  processors  during  round  i  of  r'.  Since  all 
nonfaulty  processors  will  receive  these  messages,  we  have  (r',£)  |=  E,/ip.  It 
follows  that  p  D  Ej^p  is  valid  at  time  t,  and  the  induction  rule  implies 
p  D  CffP  is  valid  at  time  t  as  well.  Thus,  (r,  £)  ]=■■  p  implies  (r,  £)  |=  C//p.  □ 

(We  note  in  passing  that  a  converse  to  Lemma  3.28  is  also  true:  If  the  joint 
view  at  time  £  —  1  of  a  set  5  of  processors  is  common  knowledge  at  time  £, 
then  the  processors  in  some  set  S'  D  S  must  implicitly  know  at  time  £  —  1 
that  there  are  t  faulty  processors  among  the  members  of  S' .) 

In  addition  to  Lemma  3.28,  the  following  result,  analogous  to  Lemma  3.8 
in  the  omissions  model,  will  be  of  use  in  the  proof  of  Lemma  3.25. 

Lemma  3.29:  Let  r  and  r'  be  runs  differing  only  in  the  (faulty)  behavior 
displayed  by  processor  p  after  time  fc,  and  suppose  no  more  that  f  processors 
fail  in  either  r  o):  r'.  lf£— k<t  +  l  —  f,  then  (r,  1)  (r',  £). 

Proof:  The  proof  is  analogous  to  the  proof  of  Lemma  3.8,  with  the  added 
observation  that  if  p  sends  no  messages  after  (an  arbitrary)  time  k'  in  a,  then 
(a,£)  {s' ft)  where  s'  differs  from  a  in  that  p  receives  messages  from  an 
arbitrary  set  of  processors  during  round  k'.  □ 
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Finally,  as  previously  mentio’’ed,  the  proof  of  Lemma  3.25  involves  a 
reduction  from  the  Vertex  Cover  problem.  This  is  the  problem  (see  [GJ79]) 
of  determining,  given  a  graph  G  =  (V,E)  and  a  positive  integer  M,  whether  G 
has  a  vertex  cover  ol  size  M  or  less;  that  is,  a  subset  V  C  V  such  that  1V|  <  M 
and,  for  each  edge  {u,it;}  €  E,  at  least  one  of  v  or  tu  belongs  to  V. 

Theorem  [Karp]:  Vertex  Cover  is  NP-complete. 

We  now  prove  Lemma  3.25. 

Proof  of  Lemma  3.25;  We  will  exhibit  a  Turing  reduction  from  Vertex 
Cover  to  the  problem  of  testing  for  common  knowledge  of  (pi, ...  and  it 
will  follow  that  this  problem  is  NP-hard. 

Since  every  graph  G  =  (V,E)  is  jVj-coverable,  the  following  is  an  algo¬ 
rithm  for  Vertex  Cover: 

m  <—  jV|; 

while  G  has  a  vertex  cover  of  size  m  —  1  do 
m  m  —  1; 
if  m  <  Af 

then  return  “G  has  a  vertex  cover  of  size  M” 
else  return  “G  has  no  vertex  cover  of  size  M”. 


To  implement  this  test,  it  is  enough  to  implement  a  test  that,  given  an  m- 
coverable  graph  G,  determines  whether  G  is  (m  —  l)-coverable.  Every  graph 
G  =  {V,E)  clearly  has  a  vertex  cover  of  size  |  V]  —  1.  In  addition,  it  is  possible 
to  determine  whether  G  has  a  vertex  cover  of  size  j  V]  —  2  in  polynomial  time. 
Similarly,  it  is  easy  to  determine  whether  G  has  a  vertex  cover  of  size  0  in 
polynomial  time.  We  show  that  if  1  <  m  <  jV]  —  2  and  G  is  m-coverable, 
then  it  is  possible  to  construct  in  polynomial  time  a  graph  Q{r,  t)  with  the 
property  that  (r,  f)  |=  V»  iff  G  is  not  (m  —  l)-coverable.  The  point 
(r,i)  will  be  a  point  of  a  system  E(n,t)  with  n  >  2t  from  the  class  under 
consideration  (i.e.,  the  class  of  systems  running  a  full-information  protocol 
in  the  generalized  omissions  model).  Thus,  given  an  oracle  for  testing  for 
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(m  —  l)-coverability  of  G.  It  will  follow  that  testing  for  common  knowledge 
of  ^j, . . . , ^>6  is  NP-hard. 
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k  A  +  l  /  —  fc  +  3 


Figure  3.5:  Embedding  a  graph  G  in  a  ran  r. 


Fix  a  graph  G  =  (V,E)  and  an  integer  m  satisfying  1  <  m  <  —  2. 

Let  n  =  |F|  +  m  +  3  and  t  =  m  +  2,  and  let  S(n,t)  be  a  system  from  the 
class  under  consideration.  Notice  that  since  |F|  >  m  +  2,  we  have  n  >  2t. 
Since  each  fact  (pi  is  admissible,  for  each  (pi  we  can  explicitly  construct  in 
polynomial  time  a  labeled  communication  graph  (of  a  point  in  S(n,t))  of 
minimal  length  determining  (pi.  Of  these  graphs,  let  Q  be  one  of  minimal 
length,  say  of  length  k.  Let  r  be  a  run  of  S(n,  t),  illustrated  in  Figure  3.5, 
satisfying  the  following  conditions:  (i)  the  input  received  in  the  first  k  rounds 
of  r  is  the  same  as  in  and  no  input  is  received  after  time  k]  (ii)  all  messages 
in  the  first  k  rounds  Me  delivered;  (iii)  in  round  A  +  1,  the  only  undelivered 
messages  are  as  follows:  no  message  is  delivered  from  processor  to  ii^ 
round  fc  +  1  of  r  iff  there  is  an  edge  from  v  to  lu  in  G  (that  is,  the  graph  G 
is  represented  by  the  undelivered  messages  during  round  k  +  .1);  (iv)  two 
additional  processors  fi  and  /j  are  silent  from  time  k  +  1  inr,  and  all  other 
messages  after  time  ft  +  1  are  delivered;  and  (v)  a  set  S'  of  <  +  1  additional 
processors  do  not  fail  in  r.  Since  G  has  a  vertex  cover  V  of  size  m,  one  failure 
pattern  consistent  with  the  undelivered  messages  in  r  is  that  p„  is  faulty  for 
every  v  €  V  (accounting  for  the  undelivered  messages  during  round  fc+1  of  r) 
and  that  both  fi  and  /j  are  faulty.  Given  that  t  =  m+2  processors  fail  in  this 
failure  pattern,  there  is  a  run  r  of  S(n,  t)  satisfying  the  required  conditions. 
Since  the  graph  G  determining  the  input  of  ^(r,A!)  can  be  constructed  in 


polynomial  lime,  setting  I  ~  k  -to, 


polynomial  time  as  well.  It  remains  to  show  that  (r,  1)  (=  V»  iff  G  is 


not  (m  —  l)-coverable. 
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Suppose  G  has  no  vertex  cover  of  size  m  —  1,  and  let  F  be  the  set  of 
processors  failing  in  r.  Since  fi  and  /j  must  be  faidty  (each  fails  to  the 
t  +  1  processors  in  5),  F'  F  —  {/i,  /a}  must  account  for  every  undelivered 
message  during  round  A:  +  1.  If  there  is  an  edge  from  v  to  in  in  G,  then 
no  message  from  to  Pu,  is  delivered  in  round  k  +  1,  and  one  of  p„  or 
must  be  in  F'.  It  follows  that  F'  must  induce  a  vertex  cover  of  G.  Since  G 
has  no  vertex  cover  of  size  m  —  1,  F'  must  contain  at  least  m  processors, 
and  F  at  least  t  =  m  +  2.  Thus,  the  processors  in  S  implicitly  know  at  time 
A:  +  2  that  their  complement  ^  =  P  —  S  contains  t  faulty  processors.  By 
Lemma  3.28,  their  states  at  time  A:  +  2  must  be  common  knowledge  at  time 
A:  +  3.  These  states  contain  a  complete  description  of  Q(r,  A;),  and  hence  the 
identity  of  Q(r,  k)  is  common  knowledge  at  Recall  that  G  was  chosen 

to  be  a  graph  determining  y?,-  for  some  i.  If  G  does  not  specify  a  failure,  then 
G(r,  k)  =  G,  and  it  follows  that  (r,^)  |=  Cj/(pi.  On  the  other  hand,  if  G  does 
specify  a  failure,  then  (pi  is  determined  by  the  input  to  the  first  k  rounds 
of  G  and  the  existence  of  a  failure.  Notice  that  the  failure  of  /j  and  is  also 
recorded  in  the  view  of  S  at  time  A:  +  2,  and  hence  is  also  common  knowledge 
at  (r,  t).  Thus,  both  the  input  to  the  first  k  rounds  of  G  and  the  existence  of 
a  failure  are  common  knowledge  at  time  t,  and  it  follow  3  that  {r,t)  |=  Cu<Pi‘ 
In  either  case,  we  have  (r,  1)  }=  Vi 

Conversely,  suppose  G  does  have  a  vertex  cover  of  size  m  —  1.  Without 
loss  of  generality,  at  most  t  —  1  processors  fail  in  r.  First,  we  claim  that 
(r,  t)  ~  (s,  1)  where  s  is  a  failure-free  run  with  the  input  of  r.  Since  /i  and  /a 
feiil  only  after  time  A:  -f-  1  =  £  —  2,  two  applications  of  Lemma  3.29  imply 
that  (r,£)  ~  {t',£)  where  r'  differs  from  r  in  that  /i  and  /j  do  not  fail  in  r'. 
Since  at  most  t  —  3  processors  fail  in  r'  and  As  —  £  —  3,  by  Lemma  3.29  we 
have  {r',t)  ~  Second,  we  claim  that  for  each  <pi  there  is  a  run  Ui 

not  satisfying  tpi  that  differs  from  G  only  after  time  A?  —  1.  If  Af  =  0,  then 
since  tpi  is  admissible  and  hence  not  valid  in  the  system,  such  a  run  must 
certaixdy  exist.  On  the  other  hand,  if  As  >  0,  then  since  G  was  chosen  to  be  a 
labeled  communication  graph  of  minimal  length  determining  (pj  for  some 
such  a  run  must  exist  in  this  case  as  well.  Now,  let  u'-  be  a  run  having  the 
input  of  Ui,  in  which  no  processor  fails  before  time  I,  emd  in  which  processors 
become  silent  after  time  £  iff  there  is  a  failure  in  u,-.  Since  ipi  is  a  fact 
about  the  input  and  existence  of  failures,  and  since  does  not  satisfy  (pi, 
neither  does  Let  i  and  u'i  be  runs  of  F  in  the  omissions  model  having  the 
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operating  environments  of  s  and  «•,  respectively.  (Notice  that  these  operating 
environments  actually  are  operating  environments  of  the  omissions  model.) 
Notice  that  no  processor  fails  before  time  I  in  either  s  or  ti-.  It  follows  that 
G(s,£)  =  G{ui,i),  and  that  k{s,l)  =  We  denote  these  values  by  G 

and  k,  respectively.  Since  t  =  m  +  2  and  m  >  1,  we  have  that  t  >  3.  Thus, 
k  =  t  —  {t+l)  <  l—A  =  k  —  1.  Recall  that  s  and  have  the  same  input  (and 
no  failures)  through  time  fc  —  1.  It  ibllows  that  F(i,^)  =  V{ui,£)‘  It  follows 
by  Lemma  3.16  that  (i,/)  ~  ("ul)^)  in.  the  omissions  model,  and  hence  that 
(a,  £)  ~  {u'l,  t)  in  the  generalized  omissions  model  as  well.  Since  (r,  /)  ~  (s,  ^), 
it  follows  that  for  each  (pi  we  have  (r,/)  ~  (uj,^)  and  (uj,/)  ^  <Pi-  Therefore, 
for  each  tpi  we  have  (r,^)  ^  and  hence  {r,t)  ^  Vi  O 

We  have  seen  that,  eis  a  result  of  the  uncertainty  about  the  failure  pattern, 
the  complexity  of  determining  whether  admissible  facts  are  common  knowl¬ 
edge  is  dramatically  greater  in  this  model  than  in  more  benign  models.  It  is 
conceivable,  however,  that  this  gap  in  complexity  is  due  to  the  fact  that  faulty 
processors  may  fail  both  to  send  and  to  receive  messages,  and  not  merely  due 
to  the  uncertainty  about  the  failure  pattern.  We  can  show,  however,  that  it 
is  precisely  due  to  this  uncertainty  that  we  observe  this  complexity  gap.  Con¬ 
sider  the  closely  related  failure  model  we  have  termed  generalized  omissions 
with  information,  a  model  differing  from  the  generalized  omissions  model  in 
that  a  processor  not  receiving  a  message  can  determine  whether  it  or  the 
sender  is  at  fault.  We  can  show  that  the  construction  used  in  the  omissions 
model  can  also  be  used  in  this  model  to  yield  a  set  of  states  V{r,  t)  completely 
characterizing  what  facts  are  common  knowledge  at  the  point  (r,  1). 

Proposition  3.30;  In  generalized  omissions  with  information,  we  have  (r,  ^)  |= 
C//<P  iff  {'>'',£)  [=  9?  for  all  r'  satisf3ring  V{r',t)  =  V{r,l). 

AU  of  the  proofs  in  the  omissions  model  hold  when  generalized  to  this  model, 
with  the  exception  that  the  construction  must  be  started  with  a  nonfaulty 
processor.  (In  particular,  Lemma  3.13  holds  only  when  the  processors  p  and  q 
are  processors  that  do  not  fail  to  receive  messages.)  This  exception  says  that 
faulty  processors  may  not  be  able  to  perform  all  actions  performed  by  the 
nonfaulty  processors,  but  this  is  no  surprise  since  the  same  is  true  in  the  re¬ 
ceiving  omissions  model.  Furthermore,  the  computation  of  the  sets  Bi  in  the 
construction  now  depends  not  only  on  the  undelivered  messages,  but  also  on 
the  additional  information  that  receiving  processors  obtain  regarding  blame 
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for  the  undelivered  messages.  As  in  the  omissions  model,  this  construction 
yields  a  method  of  deriving  efficient  tests  for  common  knowledge  of  certain 
facts.  Thus,  it  is  again  possible  to  design  efficient  optimal  protocols: 


Theorem  3.31:  Let  C  be  an  implementable  practical  simultaneous  choice. 
In  genereilized  omissions  with  information,  there  is  a  polynomial-time  optimal 
protocol  for  C. 


This  shows  that  it  is  precisely  the  uncertednty  about  the  failure  pattern  that 
is  responsible  for  the  observed  gap  in  complexity,  and  not  merely  the  fact  that 
faulty  processors  may  fail  both  by  failing  to  send  and  to  receive  messages. 

The  uncertainty  about  the  failure  pattern  in  the  generalized  omissions 
model  adds  a  new  combinatorial  structure  to  the  similarity  graph  in  this 
model  that  does  not  exist  in  other  variants  of  the  omissions  model.  Since 


it  is  possible  to  assign  failure  to  processors  in  a  number  of  different  ways 
consistent  with  a  pattern  of  undelivered  messages,  it  is  possible  to  play  a 
solitaire  version  of  a  "pebbling  game”  with  the  failure  pattern  when  con¬ 
structing  paths  in  the  similarity  graph,  showing  that  one  point  is  similar  to 
another  point  by  alternatively  assigning  responsibility  for  undelivered  mes¬ 
sages  to  the  sender  and  to  the  receiver.  In  fact,  in  addition  to  increasing  the 
difficulty  of  determining  whether  a  fact  is  common  knowledge  at  a  point,  this 
new  combinatoried  structure  has  interesting  effects  on  when  facts  become 
common  knowledge.  Recall  from  the  discussion  at  the  end  of  Section  6.1 
that  the  similarity  graph  in  the  omissions  model  is  simply  an  extension  of 
the  similarity  graph  in  the  crash  failure  model,  two  points  with  crash  failure 
patterns  being  similar  in  the  crash  failure  model  iff  they  are  in  the  omissions 
model.  As  a  result,  our  optimal  protocol  Tc  in  the  omissions  model  is  also 
an  optimal  protocol  when  restricted  to  runs  of  the  crash  failure  model.  In 
the  generalized  omissions  model,  however,  the  similarity  graph  is  not  merely 
an  elaboration  of  the  similarity  graph  in  the  omissions  model:  A  connected 
component  in  the  similarity  graph  of  the  generalized  omissions  model  may 
contain  several  distinct  connected  components  from  the  omissions  model. 
As  a  result,  optimal  protocols  in  the  generalized  omissions  model  are  not 


ucCcaSaiily  Optimal  whcli  IcStliCtcd  to  lUilS  of  the  OmiSslo&S  modcl,  &S  the 


following  theorem  shows  is  the  case  for  simultaneous  Byzantine  agreement. 
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Theorem  3.32:  No  optimal  protocol  for  simultaneous  Byzantine  agreement 
in  the  generalized  omissions  model  is  optimal  when  restricted  to  runs  of  the 
omissions  model. 

Proof:  Let  tt  be  the  failure  pattern  (involving  at  least  2t  processors)  in 

which  processor  fails  to  send  to  processor  pt+i  in  round  1  (for  i  =  1, . . . ,  t) 
and  no  other  failures  occur.  Notice  that  tt  is  a  failure  pattern  of  both  the 
omissions  model  and  the  generalized  omissions  model.  Let  r  be  a  run  of  a  full- 
information  protocol  with  the  failure  pattern  tt.  We  claim  that  some  nonvalid 
fact  about  the  initial  configuration  (in  fact,  the  entire  initial  configuration) 
must  be  common  knowledge  at  (r,  2)  in  the  omissions  model;  and  that  no 
nonvalid  fact  about  the  initial  configuration  is  common  knowledge  at  (r,  2) 
in  the  generalized  omissions  model,  from  which  it  follows  by  Corollary  3.3 
that  no  nonvalid  fact  about  the  initial  configuration  is  common  knowledge 
at  time  2  in  any  run  with  failure  pattern  tt  of  a  protocol  in  the  generalized 
omissions  model.  In  the  first  case,  any  optimal  protocol  for  simultaneous 
Byzantine  agreement  in  the  omissions  model  (the  protocol  .Fc,  for  example) 
halts  at  time  2.  In  the  second  case.  Lemma  3.1  implies  that  no  protocol  for 
simultaneous  Byzantine  agreement  in  the  generalized  omissions  model  can 
halt  at  time  2.  Therefore,  no  optimal  protocol  for  simultaneous  Byzantine 
agreement  in  the  generalized  omissions  model  is  optimal  when  restricted  to 
runs  of  the  omissions  model. 

To  see  that  some  nonvalid  fact  about  the  initial  configuration  becomes 
common  knowledge  at  (r,  2)  in  the  omissions  model,  notice  that  the  set  y(r,  2) 
is  nonempty.  The  result  follows  by  Corollary  3.18. 

To  see  that  no  nonvalid  fact  about  the  initial  configuration  becomes  com¬ 
mon  knowledge  at  (r,  2)  in  the  generalized  omissions  model,  it  is  enough  to 
show  that  (r,  2)  ~  (s,  2)  for  all  failure-free  runs  s.  Shifting  “pebbles,”  no¬ 
tice  that  (r,  2)  ~  (r',  2)  where  r'  differs  from  r  only  in  that  processor  pi  is 
nonfaulty  in  r'  and  it  is  processor  pt+i  that  fails  to  receive  the  undelivered 
message  from  pi  to  pt+i  in  round  1.  Using  Lemma  3.29  we  can  show  that 
(r',  2)  ~  (r",  2)  where  r"  differs  from  r'  only  in  that  processor  pt+i  does  not 
fail  to  receive  the  message  from  processor  pi  in  round  one.  Repeating  this 
procedure  we  can  show  that  (r",  2)  (u,  2)  where  u  is  the  failure-free  run 

with  the  input  of  r".  It  is  now  possible  to  use  Lemma  3.29  to  show  that 
(s,  2)  ~  (s',  2)  for  all  failure-free  runs  s  and  s'.  It  follows  that  (r,  2)  ~  (s,  2) 
for  all  failure-free  runs  s,  and  hence  that  no  nonvalid  fact  about  the  initial 
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configuration  is  common  knowledge  at  (r,  2).  □ 

We  remark  that,  for  most  simultaneous  choice  problems,  the  counterexample 
given  in  the  proof  of  Theorem  3.32  can  be  used  to  show  that  no  optimal 
protocol  for  this  problem  in  the  generalized  omissions  model  is  optimal  when 
restricted  to  runs  of  the  omissions  model. 

The  results  of  this  section  indicate  that  the  generalized  omissions  model 
seems  to  be  a  natural  failure  model  that  already  displays  some  of  the  complex 
behavior  of  the  more  malicious  models  such  as  the  Byzantine  failure  models. 
By  this  we  mean  that,  just  as  a  processor  in  a  Byzantine  model  may  be 
confused  by  which  of  two  other  processors  are  actually  faulty  processors,  a 
processor  in  the  omissions  model  hearing  of  a  lost  message  may  be  confused 
by  whether  the  sender  or  the  receiver  of  the  lost  message  is  at  fault.  We 
believe  that  this  model  is  therefore  a  natural  candidate  for  further  study  as 
an  intermediate  model  on  the  way  to  understanding  the  mysteries  of  fault 
tolerance  in  truly  malicious  failure  models. 


3.6  Conclusions 


This  chapter  applies  the  theory  of  knowledge  in  distributed  systems  to  the 
design  and  analysis  of  fault-tolerant  protocols  for  a  large  and  interesting  class 
of  problems.  This  is  a  good  example  of  the  power  of  applying  reasoning  about 
knowledge  to  obtain  general,  unifying  results  and  a  high-level  perspective  on 
issues  in  the  study  of  unreliable  systems. 

Given  the  effectiveness  of  a  knowledge-based  analysis  in  the  case  of  simul¬ 
taneous  actions  (see  also  [DM86]),  it  would  be  interesting  to  know  whether 
such  an  analysis  can  shed  similar  light  on  the  case  of  eventually  coordinated 
actions.  Dolev,  Reischuk,  and  Strong  [DRS82]  show  that  the  problem  of 
performing  eventually  coordinated  actions  in  synchronous  systems  is  quite 
different  from  that  of  performing  simultaneous  actions.  For  example,  they 
show  that  while  t  -f  1  is  a  general  lower  bound  on  the  number  of  rounds 
required  to  reach  simultaneous  agreement  even  when  the  number  /  of  pro¬ 
cessors  actually  failing  is  less  that  t,  eventual  agreement  can  be  reached  in 
as  few  as  /  -H  2  rounds  if  the  number  of  processors  is  sufficiently  large.  In 
addition  to  coiuiuon  knowledge,  an  analysis  of  eventually  coordinated  actions 
may  be  able  to  make  good  use  of  the  notion  of  eventual  common  knowledge 
(see  [HM84,  Mos86]).  We  note  that  it  is  possible  to  show  that  for  eventual 
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choice  problems  there  do  not,  in  general,  exist  protocols  that  are  optimal 
in  all  runs.  For  example,  one  can  give  two  protocols  for  (eventual)  Byzan¬ 
tine  agreement  with  the  property  that  for  every  operating  environment  one  of 
these  protocols  will  reach  Byzantine  agreement  (i.e.,  all  processors  will  decide 
on  a  value)  by  time  2  at  the  latest.  However,  if  t  >  1,  it  is  well-known  that 
no  single  protocol  can  guuantee  that  agreement  will  be  reached  by  time  2 
in  all  runs.  What  is  the  best  notion  of  optimality  that  can  be  achieved  in 
eventual  coordination? 

We  provide  a  method  of  deriving  an  optimal  protocol  for  any  given  tm- 
plementable  specification  of  a  simultaneous  choice  problem.  However,  in  this 
work,  we  have  completely  sidestepped  the  interesting  question  of  character¬ 
izing  the  problems  that  are  and  are  not  implementable  in  different  failure 
models.  We  believe  that  a  general  analysis  of  the  implementability  of  prob¬ 
lems  involving  coordinated  actions  in  different  failure  models  will  expose 
many  of  the  important  operational  differences  between  the  models.  As  an 
example,  our  specification  of  the  distributed  firing  squad  problem  in  the  in¬ 
troduction  is  implementable  in  the  variants  of  the  omissions  model,  but  is 
not  implementable  in  more  malevolent  models,  in  which  a  faulty  processor 
can  falsely  claim  to  have  received  a  “start”  message  and  otherwise  seem  to 
behave  correctly  (see  [BL87]  and  [CDDS85]  for  definitions  of  versions  of  the 
firing  squad  problem  that  are  implementable  in  the  more  malicious  models). 

In  the  generalized  omissions  model,  we  have  shown  how  to  derive  optimal 
protocols  for  nontrivial  simultaneous  choice  problems,  requiring  processors 
to  perform  polynomial-space  computations  between  consecutive  rounds.  We 
have  also  shown  an  NP-hard  lower  bound  for  any  communication-efficient 
protocol  for  such  a  problem  that  is  optimal  in  all  runs.  Determining  the 
precise  complexity  of  this  task  is  a  nontrivial  open  problem,  due  to  the  inter¬ 
esting  combinatorial  structure  underlying  the  generalized  omissions  model. 
It  would  also  be  interesting  to  extend  our  study  to  more  malicious  failure 
models,  such  as  the  Byzantine  and  the  authenticated  Byzantine  models  (see 
[Fis83]).  It  is  not  immediately  clear  whether  the  notion  of  a  failure  pattern 
can  be  defined  in  these  models  in  a  protocol-independent  fashion.  Thus,  it 
is  not  clear  that  the  notion  of  optimality  in  all  runs  is  well-defined  in  such 
models.^  If  such  definitions  are  possible,  we  believe  that  the  NP -hardness 

^Quite  recently,  Michel  [Mic89]  has  shown  in  the  Bysantine  model  how  to  map  runs 
of  one  protocol  to  runs  of  another  protocol  in  a  way  that  respects  processor  failures,  and 
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result  &om  the  generalized  omissions  model  should  extend  to  these  models. 
(Our  proof  does  show  that  testing  for  common  knowledge  in  runs  of  a  full- 
information  protocol  .F  in  both  models  is  NP-hard.)  Capturing  the  precise 
combinatorial  structure  of  the  similarity  graph  in  these  models  is  bound  to 
expose  many  of  the  mysterious  properties  of  the  models.  We  believe  that 
this  is  an  important  first  step  in  understanding  these  models. 

As  we  have  seen,  there  axe  no  computationally-efficient  optimal  protocols 
for  simxiltaneous  choice  problems  in  the  generalized  omissions  model.  Since 
it  is  unreasonable  to  expect  polynomial-time  processors  to  perform  NP-hard 
computations,  it  is  natural  to  zisk  what  is  the  earliest  time  at  which  simulta¬ 
neous  actions  can  be  performed  by  such  processors?  Are  optimal  protocols 
for  such  processors  guaranteed  to  exist?  In  what  sense  are  these  protocols 
optimal?  How  can  they  be  derived?  In  contrast  to  the  simpler  failure  mod¬ 
els,  the  answers  to  these  questions  in  the  generalized  omissions  model  no 
longer  seems  to  be  as  closely  related  to  the  information-theoretic  definitions 
of  knowledge  and  common  knowledge  given  in  Chapter  2,  since  they  do  not 
account  for  the  polynomial-time  limitations  on  processors’  computational  re¬ 
sources. 


A  major  challenge  motivated  by  these  questions,  therefore,  is  the  elab¬ 
oration  of  the  theory  of  knowledge  given  in  Chapter  2  to  include  notions  of 
resource-bounded  knowledge  that  would  provide  us  with  appropriate  tools 
for  analyzing  such  questions.  Such  a  theory  wordd  provide  notions  such  as 
polynomial-time  knowledge  and  polynomial-time  common  knowledge,  which 
would  correspond  to  the  actions  and  the  simultaneous  actions  that  polynomial¬ 
time  processors  can  perform.  Note  that  the  fact  that  (suboptimal)  polynomial¬ 
time  protocols  for  the  simultaneous  Byzantine  agreement  problem  exist  even 
in  the  more  malicious  failure  models  implies  that,  given  the  right  notions, 
many  relevant  facts  should  become  polynomial-time  common  knowledge. 

Recently,  in  [Mos88],  Moses  has  risen  to  this  challenge  and  proposed 
notions  of  resource-bounded  knowledge  based  on  the  existence  of  tests  for 
knowledge  similar  to  the  tests  for  common  knowledge  used  here.  Loosely 
speaking,  for  example,  a  processor  is  said  to  polynomial-time  know  a  fact 
^  at  a  point  if  it  knows  <p  at  this  point  and  there  exists  a  polynomial-time 
test  that  at  all  points  of  the  system  correctly  determines  whether  the  proces¬ 


sor  knows  ip.  Using  this  definition  of  polynomial-time  knowledge,  he  shows 


how  to  define  a  notion  of  optimality  with  respect  to  these  mappings. 
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that  polynomial-time  common  knowledge  of  certain  facts  is  a  necessary  con¬ 
dition  for  processors  to  perform  simultaneous  actions,  and,  using  this  and 
the  construction  in  the  proof  of  Theorem  3.26,  he  is  able  to  prove  that  there 
can  be  no  polynomial-time  protocol  for  simxdtaneous  Byzantine  agreement 
in  the  generalized  omissions  model  that  is  optimal  in  all  runs  with  respect  to 
polynomial-time  protocols.  We  note  that  other  related  notions  of  resource- 
bounded  knowledge  have  appeared  in  [HMT88]  and  [FZ88].  While  each  of 
these  definitions  is  weU-motivated  in  each  of  these  works,  however,  under¬ 
standing  which  of  these  definitions  is  in  general  the  “correct”  definition  is 
still  an  open  problem.  We  will  return  to  this  topic  again  in  Chapter  5  where 
we  study  cryptographic  protocols  in  terms  of  a  form  of  resource-bounded 
knowledge. 


Chapter  4 

Knowledge,  Probability,  and 
Adversaries 


In  this  chapter,  we  explore  the  relationship  between  knowledge  and  proba¬ 
bility  in  probabilistic  systems. 


4.1  Introduction 

In  a  number  of  areas  of  research,  including  distributed  computing,  artificial 
intelligence,  and  economics,  we  are  faced  with  the  problem  of  understanding 
a  system  of  agents  (possibly  processors  in  a  distributed  network  or  consumers 
in  an  economic  model)  that  interact  in  some  way.  Often,  probability  plays  a 
role  in  this  interaction:  in  the  context  of  game  theory,  for  example,  an  agent 
might  toss  a  coin  in  order  to  determine  its  next  move  in  a  game.  As  we  try 
to  understand  these  probabilistic  systems,  we  often  find  ourselves  reasoning, 
at  least  informally,  about  knowledge  and  probability  and  their  interaction. 
Consider,  for  example,  a  probabilistic  primality-testing  algorithm.  Such  an 
algorithm  might  guarantee  that  if  the  input  n  is  a  composite  number,  then 
with  high  probability  the  algorithm  will  find  a  “witness”  that  can  be  used  to 
verify  that  n  is  composite.  Loosely  speaking,  we  reason,  if  an  agent  runs  this 
algorithm  on  input  n  and  the  algorithm  fails  to  find  such  a  witness,  then  the 

This  chapter  is  joint  work  with  Joe  Halpem.  A  preliminary  version  of  this  work  will 
appear  in  Proceedings  of  ihe  8th  Annual  ACM  Symposium  on  Principles  of  Distributed 
Computing,  August,  1989. 
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agent  knows  that  n  is  almost  certainly  prime,  since  the  agent  is  guaranteed 
that  the  algorithm  would  almost  certainly  have  found  a  witness  had  n  been 
composite. 

A  number  of  recent  papers  have  tried  to  formalize  this  sort  of  reason¬ 
ing  about  knowledge  and  probability.  Fagin  and  Halpern  [FH88]  present  an 
abstract  model  for  knowledge  and  probability  in  which  they  assign  to  each 
agent  and  state  a  probability  space  to  be  used  when  computing  the  prob¬ 
ability,  according  to  that  agent  at  that  state,  that  a  formula  <p  is  true.  In 
their  framework,  the  problem  of  modeling  knowledge  and  probability  reduces 
to  choosing  this  assignment  of  probability  spaces.  Although  they  show  that 
more  than  one  choice  may  be  reasonable,  they  do  not  tell  us  how  to  make 
this  choice.  One  particular  (and  quite  natural)  choice  is  made  in  [FZ88]  and 
some  arguments  are  presented  for  its  appropriateness;  another  is  made  in 
[HMT88]  (Chapter  5)  and  used  to  analyze  interactive  proof  systems.  It  is 
not  initially  clear,  however,  which  choice  is  most  appropriate. 

In  this  chapter,  we  clarify  the  issues  involved  in  choosing  the  right  assign¬ 
ment  of  probability  spaces.  We  argue  that  no  single  assignment  is  appropriate 
in  all  contexts.  Different  assignments  can  be  viewed  as  most  appropriate  in 
the  context  of  betting  against  different  adversaries.  Thinking  in  terms  of 
such  betting  games,  a  statement  such  as  “I  know  event  E  will  happen  with 
probability  at  least  a”  is  meaningless  until  the  powers  of  our  opponent  in 
the  betting  game  have  been  specified.  A  strategy  that  will  win  a  game  with 
probability  .99  against  a  weak  adversary  may  win  the  game  with  probability 
only  .33  against  a  strong  adversary.  Consequently,  even  if  we  are  told  that 
a  certain  strategy  will  win  the  game  with  probability  .99  against  a  certain 
adversar}’’,  we  cannot  tell  whether  it  is  a  good  strategy  until  we  know  the 
powers  of  this  certain  adversary. 

We  find,  however,  that  the  notion  of  an  opponent  in  a  betting  game 
does  not  fully  capture  all  the  subtleties  that  arise  when  modeling  knowledge 
and  probability  in  distributed  systems.  We  present  a  fcamework  with  three 
different  types  of  adversaries,  each  playing  a  fundamentally  different  role. 
We  briefly  describe  these  roles  here,  and  explore  them  in  greater  depth  in 
the  rest  of  the  chapter. 

When  we  analyze  probabilistic  protocols,  we  typically  do  so  in  terms  of 
probability  distributions  on  the  runs  or  executions  of  the  protocol.  WTien  we 
say  a  protocol  is  correct  with  probability  .99,  we  mean  the  protocol  will  do  the 
right  thing  in  .99  of  the  runs.  A  closer  analysis  of  the  situation  reveals  some 


4.2.  INTRODUCTION 


101 


subtleties.  In  fact,  we  do  not  have  a  probability  distribution  on  the  entire 
set  of  runs.  In  probabilistic  algorithms  for  testing  primality  such  as  those 
of  Rabin  [RabSO)  and  Solovay  and  Strassen  [SS77],  for  example,  we  typically 
do  not  assume  a  distribution  on  the  inputs  (the  numbers  to  be  tested).  The 
only  source  of  probability  comes  £cora  the  coins  tossed  during  the  execution 
of  the  algorithm.  This  means  that  for  every  fixed  input,  there  is  a  probability 
space  on  the  runs  of  the  protocol  on  that  input,  rather  than  there  being  one 
probability  space  on  the  set  of  all  runs.  We  can  view  the  choice  of  input 
as  a  nondeterzninistic  choice  to  which  we  do  not  assign  a  probability.  Thus, 
we  prove  the  algorithm  works  correctly  with  high  probability  for  each  initial 
nondeterministic  choice.  A  similar  situation  arises  in  probabilistic  protocols 
that  are  designed  to  work  in  the  presence  of  a  nondeterministic  (perhaps 
adversarial)  scheduler  (e.g.,  [Rab82]).  Again,  we  do  not  wish,  to  assume  some 
probability  of  playing  a  given  scheduler.  Instead,  we  factor  out  the  choice 
of  scheduler  and  prove  that  the  protocol  is  correct  with  high  probability  for 
each  scheduler. 


This,  then,  is  the  role  played  by  the  first  type  of  adversary:  to  factor 
out  the  nondeterminism  in  the  system,  allowing  us  to  place  a  well-defined 
probability  on  the  set  of  runs  for  each  fixed  adversary.  We  remark  that 
this  need  to  factor  out  the  nondeterminism  is  implicit  in  most  analyses  of 
probabilistic  protocols,  and  appears  explicitly  in  [Rab82,  Var85,  FZ88]. 

The  probability  on  the  runs  can  be  viewed  as  giving  us  an  a  priori  prob¬ 
ability  of  an  event,  before  the  protocol  is  run.  However,  the  probability  an 
agent  places  on  runs  will  in  general  change  over  time,  as  a  function  of  infor¬ 
mation  received  by  the  agent  in  the  course  of  the  execution  of  the  protocol. 
New  subtleties  arise  in  analyzing  this  probability. 

Consider  a  situation  with  three  agents  Pi,  Ps,  and  pa.  Agent  pa  tosses  a 
fair  coin  at  time  1  and  observes  the  outcome  at  time  2,  but  agents  pi  and  ps 
never  learn  the  outcome.  What  is  the  probability  according  to  pi  that  the 
coin  lands  heads?  Clearly  at  time  1  it  shoitld  be  1/2.  What  about  at  time  2? 
There  is  one  argument  that  says  the  answer  should  be  1/2.  After  all,'agent 
Pi  does  not  learn  any  more  about  the  coin  as  a  result  having  tossed  it,  so  why 
should  its  probability  change?  Another  argument  says  that  after  the  coin  has 
been  tossed,  it  does  not  make  sense  to  say  that  the  probability  of  heads  is  1/2. 


The  coin  has  either  landed  heads  or  it  hasn’t,  so  the  probability  of  the  coin 


landing  heads  is  either  0  or  1  (although  agent  pi  does  not  know  which).  This 


point  of  view  appears  in  a  number  of  papers  in  the  philosophical  literature 
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(for  example,  [vP80,  Lew80]),  Interestingly,  the  same  issue  arises  in  quantum 
mechanics,  in  Schrodinger’s  famous  cat-in-the-box  thought  experiment  (see 
[Pag82]  for  a  discussion). 

We  claim  that  these  two  choices  of  probability  are  best  explained  in  terms 
of  betting  games.  At  time  1,  agent  pi  should  certainly  be  willing  to  accept 
an  offer  ixom  either  pa  or  pa  to  bet  $1  for  a  payoff  of  $2  if  the  coin  lands 
heads  (assuming  pi  is  risk  neutral).^  Half  the  time  the  coin  will  land  heads 
and  Pi  will  be  $1  ahead,  and  half  the  time  the  coin  will  land  tails  and  pi 
wli  lose  $1,  but  on  average  pi  will  come  out  even.  On  the  other  hand,  agent 
i  is  clearly  not  willing  to  accept  such  an  offer  from  P2  at  time  2  (since  p2 
would  presumably  offer  the  bet  only  when  it  is  sure  it  will  win),  although  it 
is  still  willing  to  accept  this  bet  from  ps.  The  point  here  is  that  in  a  betting 
game,  not  only  is  your  knowledge  important,  but  also  the  knowledge  of  the 
opponent  offering  the  bet.  Betting  games  are  not  played  in  isolation! 

Thus,  the  role  played  by  the  second  type  of  adversary  in  our  framework  is 
to  model  the  knowledge  of  the  opponent  offering  a  bet  to  an  agent  at  a  given 
point  in  the  run.  One  obvious  choice  is  to  assume  you  are  playing  against 
someone  whose  knowledge  is  identical  to  your  own.  This  is  what  decision 
theorists  implicitly  do  when  talking  about  an  agent’s  posterior  probabilities 
[BG54];  it  is  also  how  we  cun  understand  the  choice  of  probability  space 
made  in  [FZ88].  By  way  of  contrast,  the  choice  in  [HMT88]  corresponds  to 
playing  someone  who  has  complete  knowledge  about  the  past  and  knows  the 
outcome  of  the  coin  toss;  this  corresponds  to  the  viewpoint  that  says  that 
when  the  coin  has  landed,  the  probability  of  heads  is  either  0  or  1  (although 
you  may  not  know  which), 

A  further  complication  arises  when  analyzing  asynchronous  systems.  In 
this  case  there  is  a  precise  sense  in  which  the  agent  does  not  even  know  exactly 
when  the  event  to  which  it  would  like  to  assign  a  probability  is  being  tested. 
Thus  we  need  to  consider  a  third  type  of  adversary  in  asynchronous  systems, 
whose  role  is  to  choose  the  time.  To  illustrate  the  need  for  this  third  type 
of  adversary,  we  give  an  example  of  an  asynchronous  system  where  there  are 
a  number  of  plausible  answers  to  the  question  “What  is  the  probability  the 
most  recent  coin  toss  landed  heads?”.  It  turns  out  that  the  different  answers 
correspond  to  different  adversaries  choosing  the  times  to  perform  the  test 

^Informally,  an  agent  is  said  to  be  risk  neutral  if  it  is  willing  to  accept  all  bets  where 
its  expectation  of  winning  is  nonnegative. 
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in  different  ways.  We  remark  that  the  .  Vf  of  asynchronous  systems  is  also 
considered  in  [FZ88].  We  can  understand  the  assignment  of  “confidence” 
made  there  as  corresponding  to  playing  against  a  certain  class  of  adversaries 
of  this  third  type. 

having  shown  that  different  definitions  of  probabilistic  knowledge  corre¬ 
spond  to  different  classes  of  adversaries,  we  show,  given  a  class  of  adversaries, 
how  to  construct  a  definition  most  appropriate  for  this  class.  We  formalize 
our  intuition  concerning  the  probability  an  agent  assigns  to  an  event  in  terms 
of  a  betting  game  between  the  agent  and  an  opponent.  We  show  that  our 
“most  appropriate”  definition  has  the  property  that  it  enables  an  agent  to 
break  even  in  this  game,  and  any  other  definition  with  this  property  must 
correspond  to  an  opponent  even  more  powerful  than  the  actual  opponent. 
These  results  form  the  technical  core  of  this  chapter. 

The  rest  of  the  chapter  is  organized  as  follows.  In  the  next  section, 
Section  4.2,  we  consider  the  problem  of  putting  a  probability  on  the  runs 
of  a  system;  this  is  where  we  need  the  first  type  of  adversary,  to  factor  out 
the  nondeterministic  choices.  In  Section  4.3  we  start  to  consider  the  issue 
of  how  probability  should  change  over  time.  In  Section  4.4  we  consider  th« 
choices  that  must  be  made  in  a  general  definition  of  probabilistic  knowiedge. 
In  Section  4.6  we  consider  particular  choices  of  probability  assignments  that 
seem  reasonable  in  synchronous  systems.  Here  we  consider  the  second  type  of 
adversary,  representing  the  knowledge  of  the  opponent  in  the  betting  game. 
In  Section  4.6,  we  consider  asynchronous  systems,  where  we  also  have  to 
consider  the  third  type  of  adversary.  In  Section  4.7  we  apply  our  ideas  to 
analyzing  the  coordinated  attack  problem,  showing  how  different  notions  of 
probability  correspond  to  different  levels  of  guzurantees  in  coordinated  attack. 
The  chapter  ends  with  two  appendices.  In  Appendix  4.A  we  give  the  proofs 
of  the  results  claimed  in  the  chapter,  and  in  Appendix  4.B  we  discuss  some 
interesting  secondary  observations  related  to  the  rest  of  the  chapter. 


4.2  Probability  on  runs 


In  order  to  discuss  the  probability  of  events  in  a  distributed  system,  we 


sp 


Af 


place  a  reasonable  probability  distribution  on  the  runs  of  a  system,  it  is 
necessary  to  postulate  the  existence  of  the  first  type  of  adversary  sketched 
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in  the  introduction. 

Consider  the  simple  system  consisting  of  a  single  agent  who  tosses  a  fair 
coin  once  and  halts.  This  system  consists  of  two  runs,  one  in  which  the 
coin  comes  up  heads  and  one  which  the  coin  comes  up  tails.  The  coin 
toss  induces  a  very  natural  distribution  on  the  t\tro  runs:  each  is  assigned 
probability  1/2. 

Now  consider  the  system  (suggested  by  Moshe  Vardi;  a  variant  appears  in 
[FZ88])  consisting  of  two  agents,  pi  and  Pi,  where  pi  has  an  input  bit  and  two 
coins,  one  fair  coin  landing  heads  with  probability  1/2  and  rue  bias-d  coin 
landing  heads  with  probability  2/3.  If  the  input  bit  is  0,  />)  ^o^se•;  the  fair 
coin  once  and  halts.  If  the  input  bit  is  1,  pi  tosses  the  biased  coin  Os'**!  halts. 
This  system  consists  of  four  runs  of  the  form  (6,c),  whe’.-e  h  is  the  value  of 
the  input  bit  and  c  is  the  outcome  of  the  coin  toss.  Whal  i-;  ibe  appropiiatc 
probability  distribution  on  the  runs  of  this  system?  For  exa^ip'c,  'mat  is  the 
probability  of  heads? 

Clearly  the  conditional  probability  of  heads  given  that  the  input  hit  is  0 
should  be  1/2,  while  the  conditional  probability  of  heads  given  the  input  bit 
is  1  should  be  2/3.  But  what  is  the  unconditional  probability  of  heads?  If  we 
are  given  a  distribution  on  the  inputs,  then  it  is  easy  to  answer  this  question. 
If  we  assume,  for  example,  that  0  and  1  are  equally  likely  as  input  values, 
then  we  can  compute  that  the  probability  of  heads  is  |  j  j  •  |  =  ^ .  If  we 
are  not  given  a  distribution  on  the  inputs,  then  the  question  has  no  obvious 
answer.  It  is  tempting  to  assume,  therefore,  that  such  a  distribution  exists. 
Often,  however,  assuming  a  particular  fixed  distribution  on  inputs  leads  to 
results  about  a  system  that  are  simply  too  weak  to  be  of  any  use.  Knowing 
an  algorithm  produces  the  correct  answer  in  .99  of  its  runs  when  all  inputs 
are  equally  likely  is  of  no  use  when  the  algorithm  is  used  in  the  context  of  a 
different  distribution  on  the  inputs. 

To  overcome  this  problem,  one  might  be  willing  to  assume  the  existence 
of  some  fixed  but  unknown  distribution  on  the  inputs.  Proving  that  an 
algorithm  produces  the  correct  answer  in  .99  of  the  runs  in  the  context  of  an 
unknown  distribution,  however,  is  no  easier  than  proving  that  for  each  fixed 
input  the  algorithm  is  correct  in  .99  of  the  runs,  since  it  is  always  possible  for 
the  unknown  distribution  to  place  all  the  probability  on  the  input  for  which 
the  algorithm  performs  particularly  poorly.  Here  the  advantage  of  viewing 
the  system  as  a  single  probability  space  is  lost,  since  this  is  precisely  the 
proof  technique  one  would  use  when  no  distribution  is  assuined  in  the  first 
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place.  Moreover,  assuming  the  existence  of  some  unknown  distribution  on 
the  inputs  simply  moves  all  problems  arising  from  nondeterminism  up  one 
level.  Although  we  have  a  distribution  on  the  space  of  input  values,  we  have 
no  distribution  on  the  space  of  probability  distributions. 

This  discussion  leads  us  to  conclude  that  some  choices  in  a  distributed 
system  must  be  viewed  as  inherently  nondeterministic  (or,  perhaps  better, 
nonprobabilistic),  and  that  it  is  inappropriate,  both  philosophically  and  prag¬ 
matically,  to  model  probabilistically  what  is  inherently  nondeterministic.  But 
then  how  can  we  reason  probabilistically  about  a  system  involving  both  non¬ 
deterministic  and  probabilistic  choices?  Our  solution — which  is  essentially 
a  formalization  of  the  standard  approach  taken  in  the  literature — is  to  fac¬ 
tor  out  initial  nondeterministic  events,  and  view  the  system  as  a  collection  of 
subsystems,  each  with  its  natural  probability  distribution.  In  the  coin  tossing 
example  above,  we  would  consider  two  probability  spaces,  one  corresponding 
to  the  input  bit  being  0  and  the  other  corresponding  to  the  input  bit  being  1. 
The  probability  of  heads  is  1/2  in  the  first  space  and  2/3  in  the  second.^ 

We  want  to  stress  that  although  this  example  may  seem  artificial,  analo¬ 
gous  examples  frequently  arise  in  the  literature.  In  a  probabilistic  primality- 
testing  algorithm  [RabSO,  SS77],  for  example,  we  do  not  want  to  assume 


^Often,  even  in  the  presence  of  nondeteiminism,  we  can  impose  a  meaningful  distribu¬ 
tion  on  the  runs  of  a  system  without  factoring  the  system  into  subsystems.  However,  the 
resulting  distribution  still  may  not  capture  all  of  our  intuition.  The  problem  in  the  preced¬ 
ing  example  is  that  probabilistic  events  (the  coin  toss)  depend  on  nonprobabilistic  events 
(the  input  bit).  Suppose,  however,  the  agent  tosses  a  fair  coin  regardless  of  the  input 
bit’s  value.  Now  it  is  natural  to  assign  probability  1/2  to  each  of  the  events  {(1,  h),  (0,  h)} 
and  {(l)t))  (0,t)}  that  the  coin  lands  head  and  tails,  respectively.  Consider,  however,  the 
situation  (discussed  in  [FH88,  HMT88])  where  an  agent  performs  a  given  action  a  iff  the 
input  bit  is  1  and  the  coin  landed  heads,  or  the  input  bit  is  0  and  the  coin  landed  tails.  It 
is  natural  to  argue  that  the  probability  the  agent  performs  the  action  a  is  also  1/2:  if  the 
input  bit  is  1  then  with  probability  1/2  the  coin  will  land  heads  and  a  will  be  performed; 
and  if  the  input  bit  is  0  then  with  probability  1/2  the  coin  will  land  tails  and  a  will  be 
performed.  Unfortunately,  our  “natural”  distribution  on  the  runs  of  the  system  does  not 
support  this  line  of  reasoning,  since  this  distribution  does  not  assign  a  probability  to  the 
set  {(1,  h),  (0,  t)}  corresponding  to  the  performance  of  a.  In  fact,  it  is  not  hard  to  see  that 
if  we  could  assign  this  set  a  probability,  then  we  would  be  able  to  assign  a  probability  to 
having  the  input  bit  set  to  0  or  1.  But  the  setting  of  the  input  bit  was  assumed  to  be 
nondeterministic!  Again,  however,  if  we  factor  out  this  initial  nondeterminism,  we  can 


view  the  systeiii  as  two  sUusystems  witn  obvious  associaiea  piooaomiy  msinouiions,  ana 


within  each  subsystem  the  action  a  is  performed  with  probability  1/2.  And  this  is  precisely 


what  the  reasoning  underlidng  our  intuition  is  implicitly  doing. 
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a  probability  distribution  on  the  inputs.  We  want  to  know  that  for  each 
choice  of  input,  the  algorithm  gives  the  right  answer  with  high  probability. 
Rabin’s  primality-testing  algorithm  [RabSO]  is  based  on  the  existence  of  a 
polynomial-time  computable  predicate  Pn(fl)  with  the  following  properties: 
(1)  if  n  is  composite,  then  at  least  3/4  of  the  a  G  {1, . . .  ,n  —  1}  cause  /n(a) 
to  be  true,  and  (2)  if  n  is  prime,  then  no  such  a  causes  n  to  be  true.  Ra¬ 
bin’s  algorithm  generates  a  polynomial  number  of  a’s  at  random.  If  Pn(a)  is 
true  for  any  of  the  a’s  generated,  then  the  algorithm  outputs  “composite”; 
otherwise  it  outputs  “prime” .  Property  (2)  guarantees  that  if  the  algorithm 
outputs  “composite”,  then  n  is  definitely  composite.  If  the  algorithm  out¬ 
puts  “prime”,  then  there  is  a  chance  that  n  is  not  prime,  but  property  (1) 
guarantees  that  this  is  very  rarely  the  case:  if  n  is  indeed  composite,  then 
with  high  probability  the  algorithm  outputs  “composite” .  If  the  algorithm 
outputs  “prime”,  therefore,  it  might  seem  natural  to  say  that  n  is  prime  with 
high  probability;  but,  of  course,  this  is  not  quite  right.  The  input  n  is  either 
prime  or  it  is  not;  it  does  not  make  sense  to  say  that  it  is  prime  with  high 
probability.  On  the  other  hand,  it  does  make  sense  to  say  that  the  algorithm 
gives  the  correct  answer  with  high  probability.  The  natural  way  to  meike  this 
statement  precise  is  to  partition  the  runs  of  the  algorithm  into  a  collection  of 
subsystems,  one  for  each  possible  input,  and  prove  that  the  algorithm  gives 
the  right  answer  with  high  probability  in  each  of  these  subsystems,  where 
the  probability  on  the  runs  in  each  subsystem  is  generated  by  the  random 
choices  for  a.  While  for  a  fixed  composite  input  n  there  may  be  a  few  runs 
where  the  algorithm  incorrectly  outputs  “prime”,  in  almost  all  runs  it  will 
give  the  correct  output. 

In  many  contexts  of  interest,  the  choice  of  input  is  not  the  only  source  of 
nondeterminism  in  the  system.  Later  nondeterministic  choices  may  also  be 
made  throughout  a  run.  In  asynchronous  distributed  systems,  for  example, 
it  is  common  to  view  the  choice  of  the  next  processor  to  take  a  step  or  the 
next  message  to  be  delivered  as  a  nondeterministic  choice.  Similar  arguments 
to  those  made  above  can  be  used  to  show  that  we  need  to  factor  out  these 


nondeterministic  choices  in  order  to  use  the  probabilistic  choices  (coin  tosses) 
to  place  a  well-defined  probability  on  the  set  of  runs.  A  common  technique 
for  factoring  out  these  nondeterministic  choices  is  to  assume  the  existence 
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Figure  4.1:  A  (labeled)  computation  tree. 


class  of  “fair”  schedulers  or  “polynomial-time”  schedulers,  and  argue  that  for 
every  scheduler  in  this  class  the  system  satisfies  some  condition. 

As  we  now  show,  if  we  view  all  nondeterministic  choices  as  under  the 
control  of  some  adversuy  from  some  class  of  adversaries,  then  there  is  a 
straightforward  way  to  view  the  set  of  runs  of  a  system  as  a  collection  of 
probability  spaces,  one  for  each  adversary.  By  fixing  an  adversary  we  fac¬ 
tor  out  the  nondeterministic  choices  and  are  left  with  a  purely  probabilistic 
system,  with  the  obvious  distribution  on  the  runs  determined  by  the  proba¬ 
bilistic  choices  made  during  the  runs.  This  is  essentially  the  approach  taken 
in  [FZ88]. 

Once  we  fix  an  adversary  A,  we  can  view  the  rims  of  the  system  with  this 
adversary  as  a  (labeled)  computation  tree  Ta  (see  Figure  4.1).  Nodes  of  the 
tree  are  global  states  and  paths  in  the  tree  are  runs.  Now,  however,  edges 
of  the  tree  are  labeled  with  positive  real  numbers  such  that  for  every  node 
the  values  labeling  the  node’s  outgoing  edges  sum  to  1.  Intuitively,  the  value 
labeling  an  outgoing  edge  of  node  a  represents  the  probability  the  system 
makes  the  corresponding  transition  from  node  a.^  Given  a  finite  path  in  the 

^Since  all  edges  have  positive  labels,  we  are  effectively  ignoring  tians’tions  with  proba¬ 
bility  0,  and  assuming  that  there  is  a  discrete  probability  distribution  on  the  set  of  pdssible 
transitions  at  each  node.  It  follows  that  each  node  can  have  at  most  a  countable  number 
of  outgoing  edges.  This  means,  for  example,  that  we  are  disallowing  the  possibility  that 
the  next  step  will  be  a  random  assignment  to  a  variable  z  chosen  with  uniform  probability 
from  the  interval  [0, 1].  We  could  easily  extend  our  model  to  deal  '.nth  this  situation  by 
assigning  probabilities  to  sets  of  transitions,  rather  than  just  individual  transitions.  We 
have  chosen  to  consider  only  discrete  probability  distributions  here  for  ease  of  exposition. 
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tree,  the  probability  of  the  set  of  runs  extending  this  finite  path  is  simply 
the  product  of  the  probabilities  labeling  the  edges  in  this  finite  path. 

It  is  natural  to  view  this  computation  tree  Ta  as  a.  probability  space,  a 
tuple  {RAy^AiPA)  where  Ra  is  the  set  of  runs  in  Ta,  Xa  consists  of  subsets 
of  Ra  that  are  measurable  (that  is,  the  ones  to  which  a  probability  can  be 
assigned;  these  are  generated  by  starting  with  sets  of  runs  with  a  common 
finite  prefix  and  closing  under  countable  union  and  complementation),  and 
a  probability  function  fiA  defined  on  sets  in  Xa  so  that  the  probability  of  a 
set  of  runs  with  a  common  prefix  is  the  product  of  the  probabilities  labeling 
the  edges  of  the  prefix.  If  we  restrict  attention  to  finite  runs  (as  is  done 
in  [FZ88]),  then  it  is  easy  to  see  that  each  individual  run  is  measurable,  so 
that  Xa  consists  of  edl  possible  subsets  of  Ra-  Moreover,  in  the  case  of  finite 
runs,  the  probability  of  a  run  is  just  the  product  of  the  transition  probabilities 
along  the  edges  of  the  run. 

It  is  occasionally  useful  to  view  this  computation  tree  Ta  as  consisting  of 
two  components;  the  tree  structure  (that  is,  the  unlabeled  graph  itself),  and 
the  assignment  of  transition  probabilities  to  the  edges  of  the  tree.  Given  an 
unlabeled  tree  Ta,  'fit  define  a  transition  probability  assignment  for  Ta  to  be 
a  mapping  r  assigning  transition  probabilities  to  the  edges  oITa-  We  will 
use  the  notation  Ta  at  times  to  refer  to  the  unlabeled  tree,  to  the  labeled 
tree,  and  to  the  induced  probability  space;  which  is  meant  should  be  clear 
from  context. 

We  define  a  probabilistic  system  to  consist  of  a  collection  of  labeled  com¬ 
putation  trees  (which  we  view  as  separate  probability  spaces),  one  for  each 
adversary  A  in  some  set  A.  We  assume  that  the  environment  component  in 
each  global  state  in  Ta  encodes  the  adversary  A  and  the  entire  past  history 
of  the  run.  This  technical  assumption  ensures  that  different  nodes  in  the 
same  computation  tree  have  different  global  states,  and  that  we  cannot  have 
the  same  global  state  in  two  different  computation  trees.  Given  a  point  c,  we 
denote  the  computation  tree  containing  c  by  T[c).  Our  technical  assumption 
guarantees  that  T(c)  is  well-defined. 

The  choice  of  the  appropriate  set  A  of  adversaries  against  which  the  sys¬ 
tem  runs  is  typically  made  by  the  system  designer  when  specifying  correct¬ 
ness  conditions  for  the  system.  An  adversary  might  be  limited  to  choosing 
the  initial  input  of  the  agents  (in  which  case  the  set  of  possible  adversaries 
would  correspond  to  the  set  of  possible  inputs)  as  is  the  case  in  the  context 
of  primality-testing  algorithms  in  which  an  agent  receives  a  single  number 
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(the  number  to  be  tested)  as  input.  On  the  other  hand,  an  adversary  may 
2l1so  determine  the  order  in  which  agents  are  allowed  to  take  steps,  the  order 
in  which  messages  arrive,  or  the  order  in  which  processors  fail.  One  might 
also  wish  to  restrict  the  computational  power  of  the  adversary  to  polynomial 
time.  It  depends  on  the  application. 


4.3  Probability  at  a  point 

We  are  interested  in  understanding  knowledge  and  probability  in  distributed 
systems.  An  agent’s  knowledge  varies  over  time,  as  its  state  changes.  We 
would  expect  the  probability  an  agent  assigns  to  an  event  to  vary  over  time 
as  well.  Clearly  an  agent’s  probability  distribution  at  a  given  point  must 
somehow  be  related  to  the  distribution  on  runs  if  it  is  to  be  at  ail  meaningful. 
Nevertheless,  the  two  distributions  (the  overall  distribution  on  the  runs  of  a 
system  and  the  distribution  on  the  runs  an  agent  uses  at  a  point)  are  quite 
different;  depending  on  which  of  the  distributions  we  use,  we  can  be  led  to 
quite  different  analyses  of  a  protocol. 

To  understand  this  distinction,  consider  the  Coordinated  Attack  prob¬ 
lem  [Gra78].  Two  generals  A  and  B  must  decide  whether  to  attack  a  common 
enemy,  but  we  require  that  any  attack  be  a  coordinated  attack;  that  is,  A 
attacks  iff  B  attacks.  Unfortunately,  they  can  communicate  only  by  messen¬ 
gers  who  may  be  captured  by  the  enemy.  It  is  known  that  it  is  impossible  for 
the  generals  to  coordinate  an  attack  under  such  conditions  [Gra78,  HM84]. 
Suppose,  however,  we  relax  this  condition  and  require  only  that  the  generals 
coordinate  their  attack  with  high  probability  [FH88,  FZ88].  To  eliminate 
all  nondeterminism,  let  us  assume  general  A  tosses  a  fair  coin  to  determine 
whether  to  attack,  and  let  us  assume  the  probability  a  messenger  is  lost  to 
the  enemy  is  1/2.  Our  new  correctness  condition  is  that  the  condition  “A 
attacks  iff  B  attacks”  holds  with  probability  .99. 

Consider  the  following  two-  jtep  solution  CAi  to  the  problem.  At  round 
0,  A  tosses  a  coin  and  sends  10  messengers  to  B  iff  the  coin  landed  heads.  At 
round  1,  B  sends  a  messenger  to  tell  A  whether  it  has  learned  the  outcome 
of  the  coin  toss.  At  round  2,  A  attacks  iff  the  coin  landed  heads  (regardless 
of  what  it  hears  from  B)  and  B  attacks  iff  at  round  1  it  learned  that  the  coin 
landed  heads.  It  is  not  hard  to  see  that  if  we  put  the  natural  probability 
space  on  the  set  of  runs,  then  with  probability  at  least  .99  (taken  over  the 
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runs)  A  attacks  iif  B  attacks;  if  the  coin  lands  tails  then  neither  attacks, 
and  if  the  coin  lands  heads  then  with  probability  at  least  .99  at  lezist  one  of 
the  ten  messengers  sent  &om  i4  to  B  at  round  0  avoids  capture  and  both 
generals  attack. 

This  is  very  different,  however,  from  saying  that  at  all  times  both  generals 
know  that  with  probability  at  least  .99  the  attack  will  be  coordinated.  To 
see  this,  consider  the  state  just  before  attacking  in  which  A  has  decided  to 
attack  but  has  received  a  message  from  B  saying  that  B  has  not  learned 
the  outcome  of  the  coin  toss.  At  this  point,  A  is  certain  the  attack  will  not 
be  coordinated.  Although  we  have  not  yet  given  a  formal  definition  of  how 
to  compute  an  agent’s  probability  at  a  given  point,  it  seems  unreasonable 
for  an  agent  to  believe  with  high  probability  that  an  event  will  occur  when 
information  available  to  the  agent  guarantees  it  will  not  occur. 

On  the  other  hand,  consider  the  solution  CA2  differing  from  the  preced¬ 
ing  one  only  in  that  B  does  not  try  to  send  a  messenger  to  A  at  round  1 
informing  A  about  whether  B  has  learned  the  outcome  of  the  coin  toss.  An 
easy  argument  shows  that  in  this  protocol,  at  all  times  both  generals  have 
confidence  (in  some  sense  of  the  word)  at  least  .99  that  the  attack  will  be  co¬ 
ordinated.  Consider  B,  for  example,  after  having  failed  to  receive  a  message 
from  A.  B  reasons  that  either  A’s  coin  landed  tails  and  neither  general  will 
attack,  which  would  happen  with  probability  1/2,  or  A’s  coin  landed  heads 
and  all  messengers  were  lost,  which  would  happen  with  probability  1/2^^; 
and  hence  the  conditional  probability  that  the  attack  will  be  coordinated 
given  that  B  received  no  messages  from  A  is  at  least  .99. 

As  the  preceding  discussion  shows,  in  a  protocol  which  has  a  certain 
property  P  with  high  probability  taken  over  the  runs,  an  agent  may  still 
find  itself  in  a  state  where  it  knows  perfectly  well  that  P  does  not  (and  wiU 
not)  hold.  While  correctness  conditions  P  for  problems  arising  in  computer 
science  have  typically  been  stated  in  terms  of  a  probability  distribution  on 
the  runs,  it  might  be  of  interest  to  consider  protocols  where  an  agent  knows 
P  with  high  probability  at  all  points.  As  we  shall  show,  the  probability 
distribution  on  the  runs  typically  corresponds  to  each  agent’s  probability 
distribution  at  time  0.  Thus,  we  can  view  the  probability  on  the  runs  as  an 
a  priori  probability  distribution.  To  require  a  fact  (or  a  condition  P)  to  hold 
with  high  probability  from  each  agent’s  point  of  view  at  all  times  is  typically 
a  much  stronger  requirement  than  requiring  it  to  hold  with  high  probability 
over  the  set  of  runs.  Arguably,  in  many  cases,  it  is  also  a  more  natural 
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requirement.  It  seems  quite  natural,  for  example,  to  require  of  a  coordinated 
attack  protocol  that  A  have  high  confidence  at  all  points  that  the  attack  will 
be  coordinated,  rather  than  allowing  A  to  attack  even  when  it  is  certain  the 
attack  will  be  uncoordinated. 


4.4  Definitions  of  probabilistic  knowledge 

We  want  to  make  sense  of  statements  such  as  “at  the  point  c,  agent  p,-  knows 
<p  holds  with  probability  a”.  The  problem  is  that,  although  we  typically  have 
a  well-defined  probability  distribution  on  the  set  of  runs  in  each  computa¬ 
tion  tree,  in  order  to  make  sense  of  such  stt  '^ements  we  need  a  probability 
distribution  on  the  points  pi  considers  possib  e  at  c.  The  reason  we  need  a 
distribution  on  points  and  not  just  on  runs  is  ihat  many  interesting  facts  are 
facts  about  points  and  not  about  runs.  Consider,  for  example,  the  fact  “the 
most  recent  coin  tossed  landed  heads”.  If  a  coin  is  tossed  many  times  in  a 
single  run,  this  fact  may  be  true  at  some  points  of  the  run  and  false  at  others, 
and  hence  is  a  fact  about  points  and  not  about  runs.  When  reasoning  about 
probabilistic  protocols,  it  seems  quite  natural  to  want  to  make  formal  state¬ 
ments  of  the  form  “agent  p  knows  with  probability  1/2  that  the  most  recent 
coin  tossed  by  agent  q  landed  heads”.  It  is  possible  to  reformulate  this  state¬ 
ment  so  that  it  becomes  a  fact  about  runs.  The  fact  “the  coin  tossed  by 
agent  q  landed  heads”  is  a  fact  about  runs;  and  the  statement  above  can  be 
reformulated  as  “for  all  times  k,  if  the  current  time  is  k,  then  agent  p  knows 
with  probability  1/2  that  the  coin  tossed  by  agent  q  landed  heads”.  In 
our  opinion,  the  former  statement  more  naturally  corresponds  to  the  way  we 
think  about  such  protocols.  If  we  are  willing  to  restrict  our  attention  to  facts 
about  the  run,  then  we  can  make  do  simply  with  a  distribution  on  runs,  but 
this  precludes  (or  at  least  complicates)  the  discussion  of  many  interesting 
events  in  a  system. 

We  begin  by  reviewing  the  general  framework  of  [FH88]  in  which,  given 
a  particular  assignment  of  probability  spaces  to  points  and  agents,  we  can 
make  sense  of  such  statements  about  an  agent’s  probabilistic  knowledge. 
The  remainder  of  the  chapter  will  focus  on  the  construction  of  appropriate 
probability  assignments. 

Define  a  probability  aasignment  7^  to  be  a  mapping  from  an  agent  pi  and 
point  c  to  a  probability  space  Vi,c  —  {Si,e,  Pi,e)>  Here  5i,c  is  a  set  of 
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points,  is  the  set  of  measurable  subsets  of  5,-,c,  and  is  a  probability 
function  assigning  a  probability  to  the  sets  in  In  most  cases  of  interest, 
one  can  think  of  S{,e  as  a  subset  of  the  points  agent  pi  considers  possible 
at  c,  and  of  as  indicating  the  relative  likelihood  according  to  p,-  that  a 
particular  point  in  Si,c  is  actually  the  current  point  c.® 

Given  such  an  assignment,  let  Si^c(<p)  be  the  set  of  the  points  in  5j,c 
satisfying  (p;  that  is,  S{,e(<p)  =  {d  €  Si,e  :  d  |=  y>}.  It  is  natural  to  interpret 
P'i,e{Si,c{(p))  as  the  probability  is  true,  according  to  agent  pi  at  the  point 
c.  One  problem  with  this  interpretation,  of  course,  is  that  the  set  Si,e{'p)  is 
not  guaranteed  to  be  measurable,  and  hence  is  not  guaranteed  to 

be  well-defined.  In  order  to  deal  with  this  problem,  we  follow  the  approach 
of  (FH88],  and  make  use  of  inner  and  outer  measures.  Given  a  probability 
space  {S,X,fi),  the  inner  measure  fi^  and  outer  measure  fi*  are  defined  by 

p,{S')  =  sup  {fi{T) :  T  C  5"  end  T  E  X} 
fills')  =  inf  {/i(r)  :TDS'andTsX} 

for  all  subsets  S'  of  5.  Roughly  speaking,  the  inner  (resp.  outer)  measure 
of  Si,e{(p)  is  the  best  lower  (resp.  upper)  bound  on  the  probability  ip  is  true, 
according  to  Pi  at  c.  It  is  easy  to  see  that  fi*{T)  =  1  —  p,(r®)  for  any  set 
T,  where  T‘  is  the  complement  of  T.  Given  a  probability  assignment  V,  we 
write  7^,  c  [=  Pri{(p)  >  a  to  mean  Pi,cS^i,e{'P))  ^  Note  that  we  need 
the  probability  assignment  V  to  make  sense  of  Pri,  We  take  Kfp  to  be  an 
abbreviation  for  Ki{Pri{(p)  >  a);  thus  K^<p  means  that  agent  pi  knows  that 
the  probability  of  (p  is  at  least  a  since  Pri{<p)  >  a  holds  at  all  points  p,- 
considers  possible. 

We  now  have  all  the  definitions  needed  to  give  semantics  to  a  logical 
language  of  knowledge  and  probability.  In  particular,  the  language  of  most 

^We  often  follow  the  standard  practice  [HalSO,  p.  73]  of  identifying  the  probability  space 
Vi^c  with  the  sample  space  5{,c;  the  intention  should  be  cleat  &om  context. 

’Returning  to  the  question  of  distributions  on  runs  versus  points,  notice  that  u  long 
as  the  set  5t,e  does  not  contain  more  than  one  point  per  run,  there  is  a  natural  bijection 
from  the  probability  on  the  points  in  5{,c  to  the  probability  on  the  runs  going  through  5{,e. 
In  general,  however,  we  allow  more  than  one  point  on  the  same  run  to  appear  in  5{,e.  As 
we  shall  see  in  the  next  section,  this  generality  is  useful  when  dealing  with  asynchronous 
systcxiaS* 

’We  remark  that  we  can  easily  extend  these  definitions  to  more  complicated  formulas 
such  as  Pri{<p)  >  2Pri(V');  see  [FH88]. 
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interest  to  us  in  the  remainder  of  this  chapter  is  the  language  £($)  obtained 
by  fixing  a  set  $  of  primitive  propositions  and  closing  under  the  standard 
boolean  connectives  (conjunction  and  negation),  the  knowledge  operators  if,-, 
probability  formulas  of  the  form  Pri{<p)  >  a,  and  the  standaird  (linear  time) 
temporal  logic  operators  next  Q  and  until  U.  Note  that  >C($)  is  sufficiently 
powerful  to  express  the  operators  iff  and  the  temporal  operators  henceforth 
□  and  eventually  0/  In  the  context  of  a  given  system,  we  say  that  £($) 
is  state-generated  if  each  of  the  primitive  propositions  in  $  is  a  fact  about 
the  global  state;  and  we  say  that  £($)  is  sufficiently  rich  if  for  every  global 
state  g  there  is  a  primitive  proposition  in  $  true  at  precisely  those  points 
with  global  state  g.  This  condition  ensures  that  the  language  £($)  is  rich 
enough  to  allow  us  to  talk  about  individual  global  states.  The  assumption 
that  £($)  is  state-generated  is  quite  reasonable  in  practice:  we  typically  take 
the  primitive  propositions  to  represent  facts  such  as  “the  coin  landed  heads” , 
“the  message  was  received”,  or  “the  value  of  variable  x  is  0”.  Each  of  these 
facts  is  a  fact  about  the  global  state,  assuming  certain  aspects  of  the  history 
are  recorded  in  the  global  state.  Sufficient  richness  is  a  technical  condition 
required  for  a  few  of  our  results.  We  can  always  make  a  language  sufficiently 
rich  by  adding  primitive  propositions. 

We  now  have  a  natural  way  of  making  sense  of  knowledge  and  probability, 
given  a  probability  assignment  V.  Unfortunately,  we  still  do  not  know  how 
to  choose  V,  but  our  choices  are  somewhat  more  constrained  than  they  may 
at  first  appear.  We  are  given  the  computation  trees  and  the  associated 
distributions  on  runs,  and  we  clearly  want  the  distribution  on  the  sample 
space  Si,c  of  points  we  associate  with  agent  pi  at  point  c  to  be  related  somehow 
to  these  distributions  on  runs.  We  next  show  that  once  we  choose  the  sample 
spaces  5'i,c,  there  is  a  straightforward  way  to  use  the  distribution  on  runs  to 
induce  a  distribution  on  5i,c.  Thus,  once  we  are  given  an  appropriate  choice  of 
sample  spaces  and  the  distributions  on  runs  of  the  computation  trees,  we  can 
construct  the  probability  assignment.  The  problem  of  choosing  a  probability 

^Wc  define  (r,  k)  |=  Qip  iff  (r,  A  -fl)  |=  v?.  »o  Qip  is  true  at  time  A  in  a  run  iff  it  is  true 
at  time  +  1,  after  the  next  step.  We  define  (r,fc)  \=  {pU^f>  to  mean  there  exists  t>  k 
such  that  (r,  /)  [=  and  (r,  I')  |=  ^  for  all  I'  with  k<t'  <t.  Thus  (pUiliia  true  at  (r,  k)  if 
V*  is  true  at  some  point  in  the  future,  and  (p  is  true  until  then.  Recall  that  0(p,  which  says 
that  (p  is  true  at  some  point  in  the  future,  can  be  taken  as  an  abbreviation  of  true  U  <p’, 
and  that  Otp,  which  says  that  ip  is  true  now  and  forever  in  the  future,  is  an  abbreviation 
for  -iO~>(p. 
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assignment,  therefore,  essentially  reduces  to  choosing  the  sample  spaces.  This 
reduction  will  clarify  important  issues  in  determining  the  appropriate  choice 
of  probability  assignments. 

The  idea  of  our  construction  is  quite  straightforward:  given  a  sample 
space  Si,c  and  a  subset  S  C  Si,c,  the  probability  of  S  (relative  to  5,-,e)  is  just 
the  probability  of  the  runs  going  through  S  normalized  by  the  probability 
of  the  set  of  runs  going  through  Si,c>  In  other  words,  the  probability  of  S  is 
the  conditional  probability  a  run  passes  through  S,  given  that  the  run  passes 
through  Si,e- 

In  order  for  this  simple  idea  to  work,  however,  the  set  Si,e  must  satisfy 
a  few  requirements.  One  natural  choice  for  Si,c  is  the  set  /C,(c)  of  all  points 
agent  p,-  considers  possible  at  c.  In  general,  however,  this  set  contains  points 
from  many  different  computation  trees,  and  attempting  to  impose  a  distri¬ 
bution  on  this  set  of  points  leads  to  the  same  difficulties  that  led  us  to  factor 
out  nondeterminism  and  view  a  system  as  a  collection  of  computation  trees 
in  the  first  place.  Recall  the  example  from  Section  3  in  which  pi  tosses  a  fair 
or  biased  coin,  depending  on  whether  its  input  is  0  or  1.  Before  (and  after) 
the  coin  is  tossed,  pa  considers  four  worlds  possible,  one  from  each  possible 
run.  We  can  no  more  place  a  probability  on  these  points  than  we  could  place 
a  probability  on  the  four  runs.  On  the  other  hand,  given  a  point  c  from  a 
run  with  input  bit  1  (corresponding  to  the  biased  coin),  if  we  restrict  82,0 
to  consist  of  the  two  points  in  the  computation  tree  with  input  1,  then  we 
can  put  a  probability  on  the  two  points  in  the  obvious  way  and  compute  the 
probability  of  heads  as  2/3.  This  intuition  leads  us  to  require  that  each  set 
Si,c  be  contained  entirely  within  a  single  computation  tree: 

REQ\.  All  points  of  Si^c  ar’e  in  T(c). 

We  remark  that,  while  REQ\  does  not  allow  us  to  take  iS',-,c  to  be  all  of 
/Ci(c),  it  still  seems  natural  to  choose  5.-.C  C  /Ci(c).  We  say  that  a  probability 
assignment  is  consistent  if  it  satisfies  this  condition.  As  pointed  out  in  [FH88j, 
a  consequence  of  this  is  that  if  pf  knows  then  <p  holds  with  probability  1; 
that  is,  Kiijp)  {Pri{ifi)  =  1).*  With  a  consistent  assignment,  it  cannot 
be  the  case  that  agent  p,  both  knows  <p  smd  at  the  same  time  assigns  -xp 
positive  probability. 

^In  fact,  as  pointed  out  in  [FH88],  this  axiom  characterises  the  property  that  the 
probability  space  used  by  pi  is  a  subset  of  the  points  that  Pi  considers  possible. 
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The  single  condition  REQi ,  however,  is  not  enough  for  our  idea  for  impos¬ 
ing  a  distribution  on  the  set  Si,c  of  points  to  work.  Because  this  idea  involves 
conditioning  on  the  set  of  runs  passing  through  Si,e,  the  definition  of  condi¬ 
tional  probability  forces  us  to  require  that  that  this  set  of  runs  is  a  measurable 
set  with  positive  measure.  Suppose  T(c)  =  {1Za}  for  some  adversary 

A.  Gi\'en  a  set  S  of  points  contained  in  T{c),  denote  by  RiS)  the  set  of 
runs  passing  through  S’,  that  is,  7^(5)  =  {r  €  Ha  -  (»*>^)  €  S  for  some  k}. 
We  require  that 

REQr-  'R.{Si,e)  €  Xa  and  {iA{'RiSi,c))  >  0. 

REQi  is  a  relatively  weak  requirement.  The  following  lemma  shows  that, 
in  practice,  REQi  is  t3q)ically  satisfied.  A  set  S  of  points  is  said  to  be  state- 
generated  if  (r,  k)  ^  S  and  r(k)  —  T'{k')  imply  (r',  k')  E  S;  in  other  words,  S 
contains  all  points  with  the  same  global  state  as  (r,  k). 

Proposition  4.1:  If  is  state-generated  and  satisfies  REQi,  then  Si,c 
satisfies  REQi. 

The  proof  of  Proposition  4.1  (and  all  other  technical  results  in  this  chap¬ 
ter)  can  be  found  in  Appendix  4.  A.  We  remark  that  this  statement  is  actually 
independent  of  the  transition  probability  assignment  r  assigning  probabili¬ 
ties  to  the  edges  of  Ta.  'While  REQi  seems  to  depend  on  both  Si,o  and  r, 
Proposition  4.1  tells  us  we  can  choose  5, *,6  without  regard  for  r  and  be  confi- 
dc.it  REQi  will  be  satisfied  for  whatever  r  we  eventually  choose,  as  long  as 
Si,e  is  state-generated. 

Given  a  set  of  points  Si,^  satisfying  REQi  and  REQi,  we  now  make 
precise  our  idea  for  imposing  a  distribution  on  5,-,c.  Intuitively,  to  construct 
the  collection  Xi,e  of  measurable  subsets  of  we  project  the  measurable 
subsets  of  the  runs  of  T(c)  onto  5i,c.  Formally,  given  a  set  W  of  funs  and  a 
set  S  of  points,  we  define  Proj{'R/,  S)  =  {(r,  k)E  S  :  r  €  IV}.  We  define 

A'i.c  =  {Pr>oJf(7^^5.•c):7^'€AU}. 


Finally,  we  define  the  probability  function  on  the  measurable  subsets  of 
Si^c  via  conditional  probability: 


inAS\ 


/»_..(«{ 5) !  K(Si,.)) 


Mns)) 


foE  all  5  G  ^j,e.  Let  Pi,.  = 
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Proposition  4.2:  If  5{,e  satisfies  REQi  and  REQ2,  then  Pi,c  is  a  probability 
space. 

We  can  now  formalize  our  intuition  that  the  construction  of  probability 
assignments  reduces  to  the  choice  of  sample  spaces.  Given  a  system  (i.e.  a 
collection  of  labeled  computation  trees),  define  a  sample  space  assignment 
to  be  a  function  S  that  assigns  to  each  agent  pi  and  point  c  a  sample  space 
S{i,  c)  =  Si,e  satisfying  REQi  and  REQ2.  Given  a  sample  space  assignment 
S,  our  construction  shows  how  to  obtain  a  probability  space  for  all  agents 
Pi  and  all  points  c.  This  naturally  determines  a  probability  assignment  V, 
which  we  call  the  the  probability  assignment  induced  by  S.  We  note  that  the 
definition  of  V  actually  depends  on  both  the  sample  space  assignment  S  and 
the  trzmsition  probability  assignment  r  (implicitly  determined  by  the  fact 
that  we  have  labeled  computation  trees).  There  are  times  when  it  is  conve- 
nient  to  start  with  an  unlabeled  computation  tree,  labeled  by  some  transition 
probability  assignment  r.  In  this  case,  we  refer  to  V  as  the  probability  as¬ 
signment  induced  by  S  and  r.  For  future  reference,  we  define  a.fact  <p  to  be 
measurable  with  respect  to  S  if  Si,e{<p)  €  Xi,c  for  all  agents  Pi  and  points  c. 

The  preceding  discussion  makes  precise  the  idea  that  choosing  a  proba¬ 
bility  assignment  reduces  to  choosing  a  sample  space  assignment,  but  still 
does  not  help  us  choose  the  sample  space  assignment.  Different  choices  re¬ 
sult  in  probability  assignments  with  quite  different  properties.  Let  us  return 
to  the  example  in  the  introduction,  where  pi  tosses  a  fair  coin,  and  neither 
P2  nor  ps  observe  the  outcome.  Clearly,  at  time  2  (after  the  coin  has  been 
tossed),  P2  considers  two  points  possible:  say  h  (the  coin  landed  heads)  and 
t  fthe  coin  landed  tails).  Consider  the  sample  space  assignment  such  that 
S^{2,h)  =  S^{2,t)  =  {h,t}.  Thus,  at  both  of  the  points  h  and  t,  the  same 
sample  space  is  being  used.  In  this  case,  at  both  points,  the  probability  of 
heads  is  1/2.  Thus,  with  respect  to  the  induced  probability  assignment,  p2 
knows  that  the  probability  of  heads  is  1/2.  On  the  other  hand,  consider 
assignment  such  that  S^{2,h)  =  {h}  and  S^{2,t)  =  {t}.  With  respect  to 
the  induced  probability  assignment,  the  probability  of  heads  at  h  according 
to  P2  is  1,  while  the  probability  of  heads  at  t  is  0.  In  this  case,  all  that  p2 
can  say  is  that  it  knows  that  the  probability  of  heads  is  either  1  or  0,  but 
it  doesn’t  know  which.  Which  is  the  right  probability  assignment?  As  we 
hinted  in  the  introduction,  the  answer  depends  on  another  type  of  adversary, 
the  one  that  P2  views  itself  as  playing  against.  This  is  the  focal  point  of  the 
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next  section. 

We  conclude  this  section  with  one  further  ex."»nple.  Consider  a  system 
where  a  fair  die  is  tossed  by  pi  eind  does  not  kn  ■  the  outcome.  Suppose 
that  at  time  2  the  die  has  already  been  tossed.  Let  ci,...c«  be  the  six 
points  corresponding  to  the  possible  outcomes  of  the  die.  What  sample 
space  assignment  should  we  use  for  pa?  One  obvious  choice  is  to  take  the 
assignment  which  assigns  the  same  sample  space  at  all  six  prints,  the 
space  consisting  of  all  the  points.  With  respect  to  this  sample  space,  each 
point  will  have  probability  1/6.  Let  (p  be  the  statement  “the  die  landed  on  an 
even  number” .  Clearly,  in  the  probability  space  induced  by  this  sample  space, 
(p  holds  with  probability  1/2.  Since  p2  uses  the  same  sample  space  at  all  six 
points,  agent  p2  knows  that  the  probability  of  (p  is  1/2.  A  second  possibility 
is  to  consider  two  sample  spaces  i^i  =  {ci,  C2,  C3}  and  S2  =  {c4,  cs,  ce};  let  the 
assignment  assign  the  sample  space  Si  to  agent  P2  at  all  the  points  in  <?!, 
and  the  sample  space  S2  at  all  the  points  in  •S'2.  Thus,  at  aU  the  points  in 
5i,  the  probability  of  (p  is  1/3,  while  at  all  the  points  in  5'2j  the  probability 
of  (p  is  2/3.  AU  P2  can  say  is  that  it  knows  that  the  probability  of  <p  is  either 
1/3  or  2/3,  but  it  does  not  know  which. 

Clearly  we  can  subdivide  the  six  points  into  even  smaUer  subspaces.  It 
is  not  too  hard  to  show  that  the  more  we  subdivide,  the  less  precise  is  P2^s 
knowledge  of  the  probability.  (We  prove  a  formal  version  of  this  statement 
in  the  next  section.)  But  why  bother  subdividing?  Why  not  stick  to  the 
first  sample  space  assignment,  which  gives  the  most  precise  (and  seemingly 
natural)  answer?  Our  reply  is  that,  again,  this  may  not  be  the  appropriate 
answer  when  playing  against  certain  adversaries.  . 


4.5  Probability  in  synchronous  systems 

We  first  consider  the  problem  of  selecting  appropriate  probability  assign¬ 
ments  in  completely  synchronous  systems.  Intuitively,  a  system  is  syn¬ 
chronous  if  all  agents  effectively  have  access  to  a  global  clock.  Recall  from 
Chapter  2  that  a  system  is  synchronous  [HV89]  if  for  all  points  (r,  k)  and 
(r',  k')  and  all  agents  pi,  if  Ti{k)  =  rj(fc')  then  k  =  V.  Again,  this  means  that 
no  two  points  an  agent  pi  considers  indistinguishable  can  lie  on  the  same  run. 

When  considering  probability,  it  turns  out  that  many  things  become  much 
easier  in  the  context  of  synchronous  systems.  For  example,  it  turns  out 
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that,  in  practice,  sample  space  assignments  satisfy  three  natural  properties: 
(a)  they  are  state-generated;  (b)  they  are  inclusive,  which  means  c  €  5t,c 
for  all  agents  p,  and  points  c;  and  (c)  they  are  uniform,  which  means  that 
d  €  Si,e  implies  Si,d  =  Si,c  for  all  agents  Pi  and  points  c  and  d.®  We  say  that 
S  (and  its  induced  probability  assignment)  is  standard  if  it  satisfies  these 
three  properties.  For  the  remainder  of  this  section  we  consider  only  standard 
assignments. 

One  convenient  feature  of  synchronous  systems  is  that  all  facts  of  interest 
are  measurable.  Recall  that  £($)  is  state-generated  with  respect  to  a  system 
7?.  if  all  the  primitive  propositions  in  $  are  facts  about  the  global  state. 


Proposition  4.3:  In  a  synchronous  system,  if  5  is  a  consistent  standard 
assignment  and  £($)  is  state-generated,  then  fp  is  measurable  with  respect 
to  S  for  all  facts  6  >C($). 


This  result  says  that  for  all  practical  purposes  we  do  not  have  to  concern 
ourselves  with  nonmeasurable  sets  and  inner  measures  in  synchronous  sys¬ 
tems.  The  proof  is  by  induction  on  the  structure  of  (p,  and  can  be  found  in 
Appendix  4.A. 

We  begin  our  examination  of  probability  assignments  in  synchronous  sys¬ 
tems  by  defining  four  sample  space  assignments  eind  their  induced  probability 
assignments.  Each  of  these  assignments  can  be  understood  in  terms  of  a  bet¬ 
ting  game  against  an  appropriate  opponent.  (This  is  the  second  type  of 
adversary  mentioned  in  the  introduction.)  We  make  this  intuition  precise 
after  we  have  defined  the  probability  assignments. 

The  first  of  these  assignments  corresponds  to  what  decision  theorists 
would  call  an  agent’s  posterior  probability.  This  is  essentially  the  proba¬ 
bility  an  agent  would  assign  to  an  event  given  everything  the  agent  knows. 
This  intuitively  corresponds  to  the  bet  an  agent  would  be  willing  to  accept 
from  a  copy  of  itself,  someone  with  precisely  the  same  knowledge  that  it  has. 
We  make  this  relationship  between  probability  and  betting  precise  shortly. 

What  probability  space  corresponds  to  an  agent’s  conditioning  on  its 
knowledge  in  this  way?  Since  we  have  identified  an  agent  p^’s  knowledge  with 
the  set  of  points  pi  considers  possible  at  c,  this  set  of  points  seems  the  most 


..A 


^Condition  (c)  is  essentially  the  denniiioTi  of  a  uniroiiii  probability  asaignsicn _ 

[FH88].  A  probability  assignment  induced  by  a  uniform  sample  space  assignment  as  we 
have  defined  it  hers  is  a  uniform  probability  assignment  in  the  sense  of  [FH88]. 
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natural  choice  for  the  space.  As  we  have  seen,  however,  this  set  of  points  is 
not  in  general  contained  in  one  computation  tree.  Thus,  we  consider  instead 
the  set  of  points  in  c’s  computation  tree  T{c)  that  pi  considers  possible  at 
c.  This  is  just  the  set  Treei^c  =  {d  6  T(c) :  c  d}.  It  is  clear  that  IZVee,-,e 
satisfies  REQi\  that  it  satisfies  REQ2  follows  by  Proposition  4.1  since  it  is 
state-generated.  By  Proposition  4.2,  therefore,  the  induced  probability  space 
( 2Vee,-,c,  Pi,e)  is  indeed  a  probability  space.  Let  5'““  be  the  sample  space 
assignment  that  assigns  the  space  TVee,-,^  to  agent  p,-  at  the  point  c,  and  let 
V’’"'*  be  the  probability  assignment  induced  by  S'"'*. 

The  probability  space  has  a  natural  interpretation.  It  is  generated 
by  conditioning  on  everything  pi  knows  at  the  point  c  and  the  fact  that  it  is 
playing  against  the  adversary  A  that  generated  the  tree  7^  in  which  c  lies.  Of 
course,  the  agent  considers  many  adversaries  possible.  Thus,  the  statement 
7^'“’“,  c  1=  Kfip  means  that  for  all  adversaries  pi  considers  possible  at  c  (given 
its  information  at  c),  the  probability  of  (p  given  all  pi  knows  is  at  least  a. 
is  precisely  the  assignment  advocated  in  [FZ88]  in  the  synchronous  case. 

Suppose  now  that  p,-  were  considering  accepting  a  bet  from  someone  (not 
necessarily  an  agent  in  the  system)  with  complete  knowledge  of  the  past 
history  of  the  system.  In  this  case,  we  claim  that  the  appropriate  choice  of 
probability  space  for  p,-  at  the  point  c  =  (r,  k)  is  all  the  other  points  (r',  k) 
that  have  the  same  prefix  as  (r,  k)  up  to  time  k;  in  other  words,  all  points 
with  the  global  state  r(jfc).  Call  this  set  of  points  Prefi^^.  Note  that  Pref^^^ 
is  independent  of  p,-,  and  depends  only  on  the  point  c.  Moreover,  Prefi  ,. 
is  clearly  state-generated  (by  r{k)  itself),  so  by  Propositions  4.1  and  4.2, 
we  can  again  induce  a  natural  probability  distribution  on  this  set  of  points 
by  conditioning  on  the  runs  passing  through  Prc/,-  ,..  Let  denote  the 
sample  space  assignment  that  assigns  Prefi  ,,  to  pi  at  c,  and  let  denote 
the  probability  assignment  induced  by  We  remark  that  this  is  the 

probability  assignment  used  in  [HMT88],  as  well  as  [LS82j. 

In  the  probability  space  any  event  that  has  already  happened  by 
the  point  c  will  have  probability  1.  Future  events  (that  get  decided  further 
down  the  computation  tree)  still  have  nontrivial  probabilities,  which  is  why 
we  have  termed  it  a  future  probability  assignment. 

Let  us  reconsider  yet  again  the  coin  tossing  example  from  the  introduc¬ 
tion,  where  agent  P2  tosses  a  fair  coin  at  time  1  but  agents  pi  and  pa  do  not 
learn  the  outcome.  Since  the  coin  has  already  landed  at  time  2,  it  is  easy  to 
check  that  we  have  c  [=  Ki{Pri{heads)  =  1 V  Pri{heada)  =  0).  On  the 


120  CHAPTER  4.  KNOWLEDGE,  PROBABILITY,  ADVERSARIES 


other  hand,  we  have  p*’®**,  c  [=  Ki{Pri{heads)  =  1/2).  Thus,  and 
correspond  to  the  two  natural  answers  we  considered  for  the  probability  of 
heads.  They  capture  the  intuition  that  the  answer  depends  on  the  knowledge 
of  the  opponent  pi  is  betting  against:  V***  corresponds  to  betting  against  pj, 
a.ad  corresponds  to  betting  against  pa. 

Notice  that  in  both  the  cases  of  and  V***,  the  probability  space 
associated  with  an  agent  at  a  point  corresponds  to  the  set  of  points  the 
agent  and  its  opponent  both  consider  possible.  Suppose,  in  general,  that 
Pi  is  considering  what  an  appropriate  bet  to  accept  from  Pj  would  be.  We 
claim  (and  show  below)  that  in  this  case  the  probability  assignment  should 
be  generated  by  the  joint  knowledge  of  agents  pi  and  Pj,  as  represented  by 
the  intersection  of  the  points  they  both  consider  possible;  that  is,  by  the 
set  Tree’i  c  =  Treci^c  D  Tretj^c-  (Note  that  Tree) =  IZVeci.c,  so  that  this 
construction  can  be  viewed  as  a  generalization  of  the  previous  one.)  Again  it 
is  easy  to  see  that  Tree\^^  is  state-generated,  so  by  Propositions  4.1  and  4.2  we 
can  induce  the  natural  distribution  on  this  set  of  points  by  conditioning  on 
the  runs  passing  through  Tret\  ,,.  Let  be  the  sample  space  assignment  that 
assigns  TVecJ- to  p,  at  c,  and  let  V*  be  the  probability  assignment  induced 
by  SK 

All  the  examples  we  have  seen  up  to  now — 5'®“,  S*'\  S^,  and  — 
have  had  the  property  that  Si,c  C  Ki{c),  which  means  they  axe  consistent. 
As  mentioned  in  Section  4.4,  such  assignments  are  characterized  by  the  in¬ 
tuitively  desirable  condition  K{((p)  (Pri((p)  =  1);  when  we  return  to  the 
coordinated  attack  problem  in  Section  4.7,  we  will  see  an  example  of  em 
inconsistent  assignment  which  causes  an  agent  to  know  the  attack  will  be 
coordinated  with  high  probability,  while  knowing  that  the  attack  will  not 
be  coordinated(!).  While  consistency  seems  a  natural  restriction  on  prob¬ 
ability  assignments,  it  is  not  a  requirement  of  our  framework.  There  may 
be  be  technical  reasons  for  considering  inconsistent  assignments.  One  obvi¬ 
ous  (although  inconsistent)  probability  assignment  associates  with  the  point 
(r,  k)  the  set  of  all  time  k  points  in  its  computation  tree.  Call  this  set  A//i,e. 
(Alli^e  is  in  fact  independent  of  p;.)  The  probability  space  induced  by  the 
construction  of  Proposition  4.2  in  this  case  simulates  the  probability  on  the 
runs.  Let  us  denote  the  associated  sample  space  and  probability  assignments 
by  .5’"*®'  and  Notice  that  if  p;  uses  the  probability  space  it  is 

essentially  ignoring  all  that  it  h2is  learned  up  to  the  point  c,  which  is  why  we 
have  termed  it  a  prior  probability. 
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All  four  of  the  sample  space  assignments  we  have  constructed  are  standard 
assignments.  It  is  not  difficult  to  see,  in  fact,  that  any  assignment  constructed 
on  the  basis  of  some  opponent’s  knowledge  will  be  standard.  This  lends  some 
justification  to  our  restriction  to  standard  aissignments.  We  can.  view  these 
four  assignments  as  points  in  a  lattice  of  all  possible  standzird  sample  space 
assignments.  We  define  an  ordering  <  on  this  lattice  by  «S'  <  iff  S-^^  Q  ■S't.c 
for  every  agent  pi  and  point  c.  An  important  property  of  this  ordering  is  the 
following: 

Proposition  4.4:  If  S  and  S'  are  standard  assignments  satisfying  S'  <  S, 
then  for  every  agent  Pi  and  point  c,  the  set  Si,c  can  be  partitioned  into  sets 
of  the  form  S'^^^  with  d  €  S'j.c- 

Intuitively,  this  means  that  the  sets  are  refinements  of  the  sets  Si,c,  since 
the  sets  S-^^  are  obtained  by  carving  the  sets  Si,c  into  pieces.  Consider 
and  5^*‘,  for  example.  Every  set  Treci^c  oI  can  be  partitioned  into  the 
sets  Tree\^^  of  S^'*  with  d  6  Treeing.  In  fact,  it  is  clear  that 

Si**  <  S’  <  5”""  < 


Furthermore,  notice  that  5*"’“  is  greatest  (with  respect  to  <)  among  all 
consistent  sample  space  assignments. 

In  the  case  of  consistent  assignments,  if  we  interpret  Si,e  as  the  intersection 
of  Pi’s  knowledge  with  its  opponent’s  knowledge,  we  can  think  ol  S'  <  S 
as  roughly  meaning  that  the  opponent  corresponding  to  S'  considers  fewer 
points  possible  and  hence  knows  more  than  the  opponent  corresponding  to  S. 
This  means,  for  example,  that  S’””',  as  the  maximal  consistent  assignment, 
corresponds  to  playing  against  the  least  powerful  opponent. 

The  ordering  on  sample  spaces  assignments  induces  an  obvious  ordering 
on  probability  assignments:  given  two  sample  space  assignments  S'  and  S 
and  their  induced  probability  assignments  V  and  V,  respectively,  we  define 
V  <  V  ifl  S'  <  S.  An  important  point  to  note  is  that  if  V'  and  V  are 
consistent  assignments  satisfying  V'  <  V,  then  /ij  ^  can  be  obtained  from  pj,c 
by  conditioning  with  respect  to  S-  ,.: 

Proposition  4.5:  In  a  synchronous  system,  ifP'  and  P  are  consistent  stan¬ 
dard  assignments  satisfying  P'  <  P,  then  for  all  agents  p,-,  all  points  c,  and 
all  measurable  subsets  S'  € 
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(a) 


S'  €  .^t.c  (so  that,  in  partictUar,  S-^  itself  is  a  measurable  subset  of 


(h) 

(c) 


mUS')  =  t‘US'\SL)  = 


It  follows  that  any  consistent  probability  assignment  can  be  obtained  from 
•prott  jjy  conditioning. 

We  are  now  able  to  make  precise  the  sense  in  which  and  V***  are 

the  “right”  probability  assignments  for  an  agent  to  use  when  playing  against 
an  opponent  who  knows  exactly  as  much  as  it  does,  when  playing  against  pj, 
and  when  playing  against  an  opponent  who  has  complete  information  about 
the  past.  We  focus  on  V’  here,  but  the  arguments  are  the  same  in  all  cases. 

Consider  the  following  betting  game  between  agents  p;  and  pj  at  a  point 
c.  Agent  Pj  offers  p,-  a  payoff  of  for  a  bet  on  <p.  Agent  p;  either  accepts  or 
rejects  the  bet.  If  Pi  accepts  the  bet,  pi  pays  one  dollar  to  pj  in  order  to  play 
the  game,  and  pj  pays  ^  dollars  to  p<  if  (p  is  true  at  c.  Thus,  if  pi  accepts 
this  bet  at  the  point  c,  then  pi's  net  gain  is  either  /3  —  1  or  —1  depending  on 
whether  (p  is  true  or  false  at  c;  if  p,*  rejects  the  bet,  we  say  its  gain  is  0. 

Intuitively,  assuming  that  pi  is  risk  neutral,  Pi  can  always  be  convinced 
to  accept  a  bet  on  (p  no  matter  how  low  the  probability  of  <p  is,  as  long 
as  Pi  believes  there  is  some  nontrivial  chance  <p  is  true  and  the  payoff  ^  is 
high  enough.  Our  intuition  says  there  must  be  some  relationship  between 
the  probability  a  with  which  p,*  knows  (p  and  this  acceptable  payoff  j3  that 
would  induce  pi  to  accept  a  bet  on  (p.  If  a  is  close  to  0  then  Pi  might  require 
a  high  payoff  to  make  the  bet’s  risk  acceptable,  while  if  a  is  close  to  1  then 
Pi  might  be  willing  to  accept  a  much  lower  payoff  since  the  chance  of  losing 
is  so  remote.  Our  claim  that  is  the  right  probability  assignment  is  based 
on  the  fact  that  determines  for  an  agent  pi  the  lowest  acceptable  payoff 
for  a  bet  with  pj  on  a  fact  <p.  In  other  words,  V’  determines  precisely  how 
an  agent  pi  should  bet  when  betting  against  pj.  In  fact,  is  in  a  sense  the 
unique  such  probability  assignment.  We  now  make  this  intuition  precise. 

What  should  pi  consider  an  acceptable  payoff  for  a  bet  on  ip,  assuming 
Pi  does  not  want  to  lose  money  on  the  bet?  Since  p,-  is  presumably  following 
some  strategy  for  offering  bets  to  pi,  the  acceptable  payoff  should  take  this 
strategy  into  account.  Consider,  for  example,  the  system  in  which  pj  secretly 
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tosses  a  fair  coin  at  time  0,  and  offers  at  time  1  to  bet  p{  that  the  coin 
landed  heads.  If  Pj  is  following  the  strategy  of  always  offering  a  payoff  of  $2, 
independent  of  the  outcome  of  the  coin  toss,  then  p,-  can  always  safely  accept 
the  bet  since,  on  average,  it  wiU  not  lose  any  money  (that  is,  Pi’s  expected 
profit  is  -  ro).  If  pj  offers  a  payoff  of  $2  only  when  the  coin  lands  tails,  then 
Pi  is  certtun  to  lose  money.  On  the  other  hand,  if  pj  offers  a  payoff  of  $2  only 
when  the  coin  Izinds  heads,  then  it  is  pj  who  is  certain  to  lose  money.  While 
we  expect  that  pj  will  not  follow  a  strategy  that  will  cause  it  to  lose  money, 
we  assume  only  that  p/s  strategy  for  offering  bets  depends  only  on  its  local 
state.  In  other  words,  given  two  points  pj  is  unable  to  distinguish,  pj  must 
offer  the  same  payoff  for  a  bet  on  (p  at  both  points.  Formally,  a  strategy  for 
Pj  is  a  function  from  pj’s  local  state  at  a  point  c  to  the  payoff  pj  should  offer 
Pi  for  a  bet  on  p  at  c.  Similarly,  we  assume  that  pf’s  strategy  for  accepting 
or  rejecting  bets  (that  is,  for  computing  acceptable  payoffs)  is  also  a  function 
of  its  local  state. 


Again,  what  should  pf  consider  an  acceptable  payoff  for  a  bet  on  <pl 
Suppose  Pi  decides  it  will  accept  any  bet  on  (p  with  a  payoff  of  at  least 
1/a  when  its  local  state  is  Si  (remember  that  pi’s  strategy  for  accepting 
bets  must  be  a  function  of  its  local  state).  Denoting  by  Bet{(p,  a)  the  rule 
“accept  any  bet  on  (p  with  a  payoff  of  at  least  1/a”,  how  well  does  pi  do 
by  following  Bet{(p,a)  when  its  local  state  is  Sf?  Clearly  Pi  will  win  some 
bets  and  lose  others,  so  we  are  interested  in  computing  pi’s  expected  profit. 
This  in  turn  depends  on  p,-’s  strategy.  This  leads  us  to  compute,  for  each 
of  pj’s  strategies  /,  agent  pi’s  expected  profit  when  pi  follows  Bet{<p,a)  and 
Pj  follows  /.  Intuitively,  if,  for  each  of  p/s  strategies  /,  agent  Pi’s  expected 
profit  is  nonnegative,  then  p,-  does  not  lose  money  on  average  by  following 
Bet{<p^(x),  regardless  of  p/s  strategy. 

Before  we  can  compute  pi’s  expected  profit,  however,  there  is  an  impor¬ 
tant  question  to  answer:  What  probability  space  should  we  use  to  compute 
this  expectation  at  a  point  c?  One  reasonable  choice  is  to  take  Trea^e]  this 
would  correspond  to  computing  this  expectation  with  respect  to  everything 
Pi  knows.  Another  reasonable  choice  would  be  to  taJce  Tree\^c‘  The  intuition 
would  be  that  p,-  wants  to  do  well  for  every  possible  choice  of  what  pj  could 


do  to  Pi.  The  sets  Tree{  ,.  correspond  to  the  different  things  pj  could  do,  since 
Pj-’s  strategy  is  a  function  of  its  local  state.  For  definiteness,  we  take  the 


expectation  with  respect  to  the  probability  space  li®re,  and  then  show 
that  our  results  would  not  have  been  affected  (at  least  in  the  synchronous 
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setting)  if  we  had  chosen  the  space  Treci^c  instead. 

Let  the  value  of  the  random  variable  Wf  =  Wf{ip,a)  at  a  point  d  denote 
Pi’s  profit  (or  winnings)  at  d,  assuming  p,  is  following  Bet{(p,a)  and  pj  lo 
following  /.  Assume  that  tp  is  measurable  with  respect  to  S*.  Let  £^i,c[W^/]  = 
^Tree>-  denote  the  expected  value  of  Wf  with  respect  to  the  probability 

space  Tree\,..  We  say  Pi  breaks  even  with  Bet{tp,  a)  at  c  if  Ei,e[Wf]  >  0  for 
every  strategy  /  for  pj.  We  say  the  rule  Bet{(p,a)  is  safe  for  pi  at  c  if  pi 
breaks  even  with  Bet{<p,  a)  at  all  points  pi  considers  possible  at  c. 

To  justify  our  definition  of  safe  bets,  we  now  prove  that  the  definition 
remains  unchanged  if  we  take  the  expectation  with  respect  to  2Veei,c  instead 
of  JVeeJ_g.  We  define  TreeJ  ,,-sa/e  to  mean  safe  as  defined  above,  and  Treei,c- 
safe  just  as  we  defined  safe,  except  that  now  we  take  the  expectation  with 
respect  to  7Vee,-,c  instead  of  Tree\  ,,. 

Proposition  4.6:  In  a  synchronous  system,  for  all  facts  (f,  all  agents  pi, 
and  all  points  c,  the  rule  Bet{(p,a)  is  Tree, -^c-safe  for  p,-  at  c  iff  Bet{(p,a)  is 
3Vce<^e*safe  for  pj  at  c. 

Our  claim  that  is  the  right  probability  assignment  to  use  when  playing 
against  pj  is  made  concrete  by  the  following  result  which  states  that  V’  de¬ 
termines  for  every  agent  p,-  precisely  what  bets  are  safe  when  betting  against 
Pi- 

Theorem  4.7:  For  all  facts  (p  measurable  with  respect  to  all  agents  pi, 
and  all  points  c,  the  rule  Bet{(p,  a)  is  safe  for  Pi  at  c  iff  c  [=  K“(p. 

We  view  this  as  the  main  result  of  this  chapter.  It  says  that  that  V’ 
determines  precisely  what  bets  are  safe  for  pi  to  accept.  If,  using  the  proba¬ 
bility  assignment  V\  agent  p,-  knows  the  probability  of  (p  is  at  least  a,  then 
Pi  will  at  least  break  even  betting  on  ip  when  the  payoff  is  1/a.  On  the 
other  heind,  if,  using  V\  agent  p,-  considers  it  possible  that  the  probability 
of  (p  is  less  than  a,  then  there  is  a  strategy  pj  can  use  that  causes  p;  to  lose 
money  betting  on  <p  when  the  payoff  is  1/a.  In  other  words,  is  the  right 
probability  assignment  to  use  when  betting  against  pj. 

While  this  theorem  is  stated  only  for  measurable  facts  (p,  remember  that 
Proposition  4.3  assures  us  that  facts  of  interest  are  typically  measurable  in 
synchronous  systems.  In  fact,  the  same  theorem  holds  even  for  nonmeasur- 
able  facts,  once  we  define  an  appropriate  notion  of  expectation  for  such  facts; 
we  consider  this  notion  in  Appendix  4.B.2 
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The  proof  of  Theorem  4.7  depends  only  on  the  fact  that  is  induced 
by  and  is  actually  independent  of  the  particular  transition  probability 
assignment  r  determining  the  distribution  on  runs.  In  this  sense  it  is  really 
that  is  determining  what  bets  are  safe  for  pf  to  accept.  We  can  formalize 
this  intuition  as  follows.  We  say  that  a  standard  sample  space  assignment  S 
determines  safe  lets  against  pj  in  a  system  consisting  of  unlabeled  computa¬ 
tion  trees  if,  for  all  transition  probability  assignments  r  assigning  transition 
probabilities  to  edges  of  the  computation  trees,  the  following  condition  holds 
for  the  probability  assignment  V  induced  by  S  imd  r: 

7^,  c  1=  implies  Bet{<p,  a)  is  safe  for  Pi  at  c 

for  all  facts  tp  €  jC($),  all  agents  pi,  and  all  points  c.  Notice  that  this 
defirition  quantifies  over  all  trainsition  probability  assignments  r,  requiring 
that  the  probability  assignment  induced  by  S  determines  safe  bets  regardless 
of  the  actual  choice  of  r.  Our  intuition  says  that  the  “right”  way  to  go  about 
constructing  a  probability  assignment  should  not  depend  on  the  details  of 
the  transition  probabilities.  We  would  like  some  uniform  way  of  choosing  the 
probability  space  that  does  not  change  if  there  are  small  perturbations  in  the 
probability;  Theorem  4.7  shows  us  that  it  is  always  possible  to  construct  an 
assignment  in  this  way. 

While  the  proof  of  Theorem  4.7  shows  that  5’  determines  safe  bets  against 
Pj,  it  turns  out  that  there  are  other  assignments  that  determine  safe  bets 
against  pj.  If  the  language  £($)  is  sufficiently  rich,  however,  so  that  there 
are  a  lot  of  possible  events  that  can  be  bet  on,  then  enjoys  the  distinction 
of  being  the  meiximum  such  assignment. 

Theorem  4.8:  In  a  synchronous  system,  if  is  a  consistent  standard  as¬ 
signment,  then 

(a)  if  <  S^,  then  S  determines  safe  bets  against  p,-,  and 

(b)  if  S  determines  safe  bets  against  pj  amd  £($)  is  sufficiently  rich,  then 

S  <  S>. 

We  interpret  Theorems  4.7  and  4.8  as  providing  strong  evidence  that  is 
the  right  sample  space  assignment,  and  hence  that  is  the  right  probability 
assignment,  to  use  when  playing  against  am  opponent  with  pfs  knowledge. 
It  says  that  the  only  way  for  p,-  to  be  guaranteed  it  is  using  a  safe  betting 
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strategy  against  pj  is  by  assuming  the  opponent  is  at  least  as  powerful 
Pj.  Intuitively,  the  more  powerful  the  opponent  the  less  confident  the  agent 
is  that  it  will  be  able  to  win  a  bet  with  this  opponent,  and  the  higher  the 
payoff  the  agent  will  require  before  accepting  a  bet.  Consequently,  pi  is  being 
unduly  conservative  if  it  takes  a  probability  assignment  that  corresponds  to 
an  agent  that  is  more  powerful  than  pj  since  it  may  pass  up  bets  it  should 
accept. 

In  the  process  of  making  this  intuition  precise,  we  can  prove  a  theorem 
that  gives  us  further  insight  into  relationships  between  sample  space  assign¬ 
ments  on  the  lattice.  Recall  that  we  have  defined  K°<p  to  mean  agent  p; 
knows  a  is  a  lower  bound  on  the  probability  of  y?.  We  can  extend  this  def¬ 
inition  to  deal  with  intervals  in  a  straightforward  way.  We  would  like  to 
define  to  mean  Ki(a  <  Pri(ip)  <  j3),  which  should  mean  agent  pi 

knows  the  probability  of  tp  is  somewhere  between  a  and  /3.  Since  (p  may 
not  correspond  to  a  measurable  set,  what  we  really  mean  is  that  the  inner 
measure  of  (p  is  at  least  a  and  the  outer  measure  is  at  most  (3.  Since  we 
interpret  Pvi  as  inner  measure  when  <p  does  not  correspond  to  a  measurable 
set,  and  since  /i*(r)  =  1  —  /i,(r®)  for  any  set  T,  we  can  capture  this  in¬ 
tuition  in  terms  of  our  language  by  interpreting  as  an  abbreviation 

for  Ki[{PTi{p))  >  a)  A  (Pr,-(-iy))  >  1  — /3,)].  To  relate  this  definition  to  our 
earlier  definition  of  Kfp,  notice  that  iff  9?  is  equivalent  to  We  can 

now  prove  the  following. 

Theorem  4.9;  In  a  synchronous  system,  if  V  and  V  are  consistent  standard 
assignments  satisfying  V  <  V,  then 

(a)  for  every  fact  p,  every  agent  pf,  every  point  c,  and  all  a,y0  with  0  < 
a  <  /?  <  1,  we  have 

V,  c  t=  if!“’^V  impUes  P,  c  f=  K^'^p, 


^^Strictly  speaking,  we  should  justify  the  fact  that  Pi  should  use  a  rule  of  the  form 
Bet{(p,  a)  in  order  to  determine  when  to  accept  a  bet.  After  all,  why  should  such  a  timple 
threshold  function  be  appropriate?  It  is  conceivable  that  a  better  money-meiking  strategy 
might  tell  p,-,  say,  to  accept  a  bet  on  (p  if  the  offered  payoff  is  in  the  interval  [2, 5]  or  [8, 10], 
and  reject  the  bet  otherwise.  It  is  not  hard  to  show,  however,  that  because  we  make  no 


assumption  about  the  strategy  being  followed  by  pj  (other  than  requiring  that  it  be  a 
function  of  py’s  local  state),  this  second  strategy  is  safe  for  p,  at  c  iff  it  is  safe  for  p,  at 


c  to  accept  a  bet  on  p  if  the  offeied  payoff  is  in  the  interval  [2,  cc),  i.e.  if  Bei{p,  1/2)  is 


safe  for  p,  at  c.  Consequently  an  optimal  strategy  may  as  well  be  taken  to  be  a  threshold 


function  like  Bei{(p,a). 


4.5.  PROBABILITY  IN  SYNCHRONOUS  SYSTEMS 


127 


(b)  there  exist  a  fact  ein  agent  Pi,  a  point  c,  and  a,/3  with  0  <  a  <  <  1 

such  that 

P',  c  lit  and  yet  V,c\= 

P',  c  and  yet  P,  c  |= 

If  jC($)  is  sufficiently  rich,  then  <p  €  £($)• 


Part  (a)  shows  that  an  agent’s  confidence  interval  does  not  increase  in  the 
presence  of  a  more  powerful  opponent;  part  (b)  shows  that  it  might  actually 
decrease.  The  formula  ip  from  part  (b)  gives  an  example  of  a  case  that 
agent  p,-  might  be  unduly  conservative  by  using  an  inappropriate  probability 
assignment:  using  P',  agent  p,-  would  reject  bets  on  (p  with  payoff  l/a  even 
though  it  should  be  accepting  all  such  bets. 

Our  results  show  that  P'*“  has  a  special  status  among  probability  assign¬ 
ments.  It  is  a  maximum  assignment  among  consistent  assignments  in  the  lat¬ 
tice  with  the  <  ordering,  and  so,  by  Theorem  4.9,  gives  the  sharpest  bounds 
on  the  probability  interval  among  all  consistent  probability  assignments.  In 
addition,  any  other  consistent  probability  assignment  can  be  obtained  from 
P'***  by  a  process  of  conditioning.  Finally,  P'*“  is  the  probability  assign¬ 
ment  that  corresponds  to  what  decision  theorists  seem  to  use  when  referring 
to  an  agent’s  subjective  (or  posterior)  probability.  However,  as  we  have  seen, 
P'®'*  may  not  always  be  the  “right”  probability  assignment  to  use.  The  right 
choice  depends  on  the  knowledge  of  the  opponent  offering  us  the  bet  in  the 
system  we  wish  to  analyze.  Although  P'"*‘  may  give  a  smaller  interval  than 
P^'  (intuitively  giving  sharper  bounds  on  an  agent’s  belief  a  fact  is  true),  if 
Pi  uses  the  better  lower  bound  from  P'"'*  as  a  guide  to  deciding  what  bet 
to  accept  from  Pj,  it  may  wind  up  losing  money.  In  fact,  it  follows  from 
Theorems  4.8  and  4.9  that  P^‘  is  the  probability  assignment  that  gives  an 
agent  the  best  interval  and  still  guarantees  a  good  betting  strategy. 

Even  in  cases  where  P’’®“  is  the  “right”  choice,  it  is  not  necessarily  the 
probability  we  want  to  use  in  computations.  It  may  not  always  be  necessary 
to  obtain  the  sharpest  interval  of  confidence  possible.  A  rough  bound  may 
be  sufficient.  Theorem  4.9  shows  that  proving  a  lower  boimd  on  an  agent’s 
confidence  using  a  certain  choice  of  probability  space  implies  the  same  bound 


holds  With  any  dcfimticn  higher  in  the  lai.t.ice.  The  advajui.a^e  oi  using  a 


probability  assignment  that  lies  lower  in  lattice  is  that,  because  the  individual 


probability  spaces  are  smaller,  the  computations  may  be  simpler.  Consider 
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the  definition  for  example.  Here  the  probability  space  we  associate  with 
a  point  (r,  k)  consists  only  of  points  (r',  k)  having  the  same  global  state  as 
(r,  k).  The  runs  r'  are  the  runs  extending  the  global  state  r{k).  This  means 
we  can  reason  about  the  probability  of  a  future  event  given  a  fixed  global 
state.  In  contrast  a  definition  such  as  V*"'*  allows  for  the  possibility  that  the 
runs  r'  may  extend  any  of  a  collection  of  global  states,  which  may  mean  we 
no  longer  have  the  luxury  of  arguing  about  the  probability  of  a  future  event 
given  a  fixed  global  state.  When  arguing  about  the  level  of  confidence  of  an 
agent,  it  seems  best  to  choose  a  definition  as  low  in  the  lattice  as  possible  to 
make  the  proof  as  simple  as  possible,  but  high  enough  to  enable  one  to  prove 
a  sufficiently  high  level  of  confidence. 


4.6  Probability  in  asynchronous  systems 


We  now  turn  our  attention  to  choosing  appropriate  probability  assignments 
in  asynchronous  systems.  We  remaurk  that  even  in  the  context  of  asyn¬ 
chronous  systems,  the  four  sample  space  assignments  discussed  in  the  previ¬ 
ous  section — <S^“,  S\  and  — still  make  perfect  sense.  The  intuition 

motivating  these  definitions  remains  the  same;  in  particular.  Theorem  4.7 
which  says  that  S’  determines  safe  bets  against  still  holds. 

A  number  of  things  do  change,  however.  For  one  thing.  Proposition  4.3 
no  longer  holds,  so  many  facts  of  interest  become  nonmewurable.  Equally 
important.  Proposition  4.5,  which  says  that  probability  assignments  further 
down  in  the  lattice  can  all  be  obtained  by  conditioning  from  probability 
assignments  higher  in  the  lattice,  also  fails  in  general.  The  reason  it  may  fail 
is  that  if  S'  <  S,  we  are  no  longer  guaranteed  that  is  a  measurable  subset 
of  Si,c-  For  example,  although  V’  <  'p'®**,  Tree^^g.  need  not  be  a  measurable 
subset  of  Treei,e.  If  Pj  can  distinguish  time  1  points  from  time  2  points  but 
Pi  cannot,  and  if  c  is  a  time  1  point,  then  Treei^,,  consists  only  of  the  time  1 
points  while  Treei,e  consists  of  the  time  1  and  2  points;  in  this  case,  Treej ,. 
is  not  a  measurable  subset  of  Treei,c-  All  our  conditioning  arguments  used 
this  measurability  assumption.  Consequently,  it  is  no  longer  true  that  all 


consistent  assignments  can  be  obtained  by  conditioning  on  P'®**.  For  similar 
reasons,  in  general  asynchronous  systems,  using  Treei^c  and  using  Tree\  g.  in 
the  definition  of  a  safe  bet  does  not  necessarily  give  the  same  results.  (The 


conditional  probability  argument  used  in  the  proof  of  Proposition  4.6  depends 
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on  the  fact  that  the  sets  Tree\  ,.  are  measurable  subsets  of  JVeej,c.)  We  can 
prove  analogues  of  Propositions  4.5  and  4.6  as  well  as  Theorem  4.9,  provided 
we  assume  that  S'  <  S  and  that  5,' is  a  measurable  subset  of  Si,e  for  all 
agents  p,-  and  points  Unfortunately,  as  we  shall  see,  this  measurability 
requirement  does  not  hold  in  many  cases  of  interest. 

The  situation  is  perhaps  best  illustrated  by  an  example.  Consider  a  simple 
asynchronous  system  in  which  agent  pi  tosses  a  fair  coin  10  times  and  halts; 
agents  P2  and  pa  do  nothing  and  never  learn  the  outcome  of  the  coin  tosses. 
This  system  consists  of  a  single  computation  tree,  a  complete  binary  tree  of 
depth  10  with  every  transition  labeled  1/2.  Suppose  agent  pa  does  not  have 
access  to  a  clock,  and  so  is  unable  to  distinguish  any  of  the  global  states  in 
the  tree.  On  the  other  hand,  pa  does  have  a  clock,  and  so  can  tell  each  time 
apart. 

There  are  clearly  2^°  possible  runs  in  the  system,  one  corresponding  to 
each  of  the  possible  sequences  of  coin  tosses.  Since  pa  cannot  distinguish 
any  point  on  any  of  these  runs,  for  every  point  c,  the  set  82“*  consists  of 
every  point  in  the  system.  Which  subsets  of  S2°e  are  measurable?  Since 
the  computation  tree  is  finite,  each  individual  run  is  a  measurable  set,  so  all 
sets  of  runs  are  measurable.  And  since  the  measurable  subsets  of  82"^  are 
obtained  by  projecting  measurable  subsets  of  runs  onto  (S'!"*,  the  sets  in 
are  those  consisting  of  all  the  points  on  some  set  of  runs  in  the  computation 
tree. 

Let  If  be  the  fact  “the  most  recent  coin  toss  landed  heads”.  Although 
this  is  a  fact  about  the  global  state,  the  set  of  points  where  it  is  true  is  not 
a  measurable  subset  of  iSj"*,  since  it  does  not  consist  of  all  the  points  on 
some  subset  of  runs.  This  already  shows  that  Proposition  4.3  fails  in  this 
case.  Thus,  we  cannot  talk  about  the  probability  that  P2  knows  y?  at  a  point 
c  in  the  tree.  We  can  talk  about  the  inner  and  outer  measure  of  Sl^cWji 
however.  Since  the  only  nontrivial  measurable  set  contained  in  S2,c{^)  is  the 
set  of  points  on  the  single  run  in  which  the  coin  leinds  heads  every  time,  the 
inner  measure  of  this  set  is  1/2^°;  similarly,  the  outer  measure  is  1  —  (1/2^°). 

While  values  such  as  1/2^°  and  1  —  (1/2^°)  may  seem  somewhat  strange 

pait  (b)  of  this  analogue  of  Theorem  4.9,  we  must  also  strengthen  the  definition  of 
sufficiently  rich  to  mean  that  for  every  global  state  there  is  a  primitive  proposition  in  $ 
true  at  all  points  of  all  runs  passing  through  this  global  state.  This  is  due  to  the  fact  that 
consistent  assignments  in  asynchronous  systems  allow  a  set  Si^c  to  contain  more  than  one 
point  of  a  given  run. 


130  CHAPTER  4.  KNOWLEDGE,  PROBABILITY,  ADVERSARIES 


at  first  glance,  they  are  not  totally  unmotivated.  Consider  the  situation  of 
agent  P2  at  a  point  c  trying  to  figure  out  the  probability  of  heads,  given  only 
the  probability  on  the  runs.  Agent  pj  has  no  idea  which  run  it  is  in.  The 
only  run  in  which  it  is  always  the  case  that  the  most  recent  coin  toss  landed 
heads  is  the  run  where  the  coin  lands  heads  on  every  toss:  this  run  occurs 
with  probability  1/2^°.  On  the  other  hand,  in  all  the  runs  except  for  the  one 
in  which  the  coin  lands  tails  on  every  toss,  it  is  possible  that  the  most  recent 
coin  toss  landed  heads.  Thus,  in  a  set  of  runs  of  probability  1  —  (1/2^°),  it  is 
possible  that  the  most  recent  coin  toss  landed  heads.  This  means  that  1/2;^° 
and  1  —  1/2^° — the  inner  and  outer  measure  of  S^^eXv^) — provide  lower  and 
upper  bounds  on  the  probability  of  being  in  a  run  where  the  most  recent  coin 
toss  landed  heads. 

Now  suppose  that  agent  P2  is  betting  against  pa.  Since  pa  knows  what 
the  time  is,  the  sets  51  consist  of  all  the  time  k  points.  With  respect 
to  the  sample  space  assignment  5^,  the  fact  99  is  measurable.  In  fact,  it’s 
easy  to  see  that  /i®(52,c(y’))  =  1/2  for  all  points  c.  To  sum  up,  we  have 
P'-Sc  h  and  1=  while  P^,cj= 

This  may  seem  somewhat  counterintuitive,  since  it  seems  to  suggest  that 
P2  must  play  more  conservatively  against  a  copy  of  itself  than  against  pa, 
who  knows  more.  This  is  especially  so  since  there  is  another  line  of  reasoning 
about  this  situation  which  would  lead  pa  to  conclude  that  it  knows  that 
the  probability  that  the  most  recent  coin  toss  landed  heads  is  1/2,  even 
without  considering  pa.  Agent  pa  reasons  as  follows:  “The  current  time  is  k, 
although  I  do  not  know  what  k  is.  Regardless  of  the  particular  value  of  k, 
the  probability  that  the  k^^  coin  toss  lands  heads  is  1/2,  and  hence  I  know 
the  most  recent  coin  toss  landed  heads  with  probability  1/2.”  The  sample 
space  Jissignment  that  captures  this  intuition  would  associate  with  the  point 
(r,  k)  and  agent  pa  the  set  of  time  k  points  in  (r,  fe)’s  computation  tree  agent 
Pa  considers  possible  at  (r,  k)  (as  opposed  to  considering  all  the  points  in  the 
computation  tree  that  pa  considers  possible,  as  is  done  by  P'”**).  But  this  is 
precisely  the  assignment  5^! 

In  order  to  understand  this  situation  a  little  better,  let  us  reconsider  the 
assignment  p*’®**.  We  claim  that  the  reason  the  interval  [1/2^°,  (1  —  1/2^°)] 


*~Noie  ihai  iois  uocS  not  contradict 


1.9,  sines  ThsorsE  4.9  would  hold  only 


if  is  a  measurable  subset  of  5{,e  for  all  p,  and  c,  which  we  have  already  noted  is  not 
the  case. 
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arises  here  is  different  from  the  reason  intervals  arise  in  the  context  of  the 
synchronous  systems  studied  in  the  preceding  section.  In  the  context  of 
synchronous  systems,  because  pj's  strategy  depends  on  its  local  state  and 
Pi  does  not  know  which  local  state  pj  is  currently  in,  pi  has  to  partition 
/Ci(c)  and  view  each  element  of  the  partition  as  an  independent  probability 
space,  computing  the  probability  of  separately  in  each  one.  A  formula 
such  as  holds  when  the  probability  of  (p  can  range  from  a  to  in  the 

different  probability  spaces.  In  our  current  example,  however,  there  is  only 
one  probabilitj'  space;  the  interval  arises  because  of  the  nonmeasurability  of 
(p.  Depending  on  how  “lucky”  pi  is  in  the  choice  of  where  in  each  run  it 
tests  for  heads,  the  probability  of  getting  heads  could  range  from  1/2^°  to 
l-(l/2i°). 

We  can  view  the  nonmeasurability  that  arises  due  to  asynchrony  as  a  new 
element  of  uncertainty  that  an  adversary  can  exploit.  Intuitively,  in  the  coin 
tossing  example,  when  pi  plays  against  (a  copy  of)  itself,  since  pi  does  not 
know  where  in  the  run  it  is,  an  adversary  gets  to  choose  that.  On  the  other 
hand,  when  playing  against  pa,  at  least  pi  knows  that  all  the  worlds  in  a  given 
sample  space  are  time  k  points,  for  some  fixed  k.  We  can  view  our  analysis 
where  we  obtain  the  answer  1/2  withoui  invoking  pa  as  implicitly  assuming 
an  adversary  who  chooses  the  time  k  the  test  for  <p  is  to  be  performed.  Such 
an  adversary  is  an  adversary  of  the  third  type  mentioned  in  the  introduction. 
Given  any  time  k  chosen  by  this  adversary,  the  probability  of  is  1/2. 


We  can  formalize  this  analysis  as  follows.  With  each  time  k  we  associate 
a  separate  computation  tree  corresponding  to  the  adversary  Ak  choosing 
time  k  to  test  for  (p.  The  probability  space  for  P2  at  each  point  in  the  tree 
corresponding  to  Ak  consists  of  the  time  k  points  in  the  tree,  each  of  which  is 
assigned  equal  probability.  In  each  of  these  probability  spaces  the  probability 
of  heads  is  1/2,  so  P2  knows  that  the  most  recent  coin  toss  landed  heads  with 
probability  1/2. 


There  is  no  reason,  however,  to  restrict  this  third  type  of  adversary  to 
simply  making  an  initial  choice  of  the  stopping  time.  Suppose  we  have  fixed 
a  collection  of  adversaries  of  the  first  type  (the  computation  trees)  and  an 
adversary  of  the  second  type  (say  pj).  We  define  a  cut  through  TrecJ-,.  to 


O  Ctl 


Kqot  r>f  Trszi  c  containing  precisely  one  point  froiu  eve*  j  raa 


through  TreCi/.  every  run  passing  through  Tree’ll  Is  cut  precisely  once  by 
such  a  set  of  points.  We  define  a  type  three  adversary  to  be  a  function 


132  CHAPTER  4.  KNOWLEDGE,  PROBABILITY,  ADVERSARIES 


mapping  an  agent  Pi  and  a  point  c  to  a  cut  through  IVee|_g.  Intuitively,  pi 
and  pj  are  betting  on  a  fact  <p,  but  neither  knows  precisely  where  in  the 
run  the  bet  is  taking  place;  it  is  the  third  type  of  adversary  who  determines 
where  in  the  run  the  bet  is  actually  made.  The  cut  through  JVecJ chosen 
by  the  adversary  is  the  set  of  points  at  which  the  adversary  will  cause  the 
bet  to  take  place  when  the  local  states  of  p,-  and  pj  are  given  by  c. 

In  the  example  above,  when  pi  plays  against  a  copy  of  itself,  the  adversary 
';hooses  one  cut  per  computation  tree,  since  pi  considers  all  points  in  the 
computation  tree  possible.  In  the  case  of  pi  playing  against  pa  (who  knows 
the  time),  the  adversary  chooses  one  cut  for  every  time  fc;  this  cut  must  in 
fact  consist  of  all  time  k  points  in  the  tree.  (In  general,  if  we  are  considering  a 
set  of  time  k  points,  the  only  allowable  cut  is  the  one  consisting  of  all  points. 
This  is  why  the  issue  of  an  adversary  choosing  such  cuts  does  not  arise  when 
considering  synchronous  systems.) 

To  make  formal  sense  of  this,  suppose  we  are  given  a  set  A  of  type  one 
adversaries  (determining  the  possible  initial  nondeterministic  choices).  This 
determines  a  set  of  computation  trees,  as  we  have  already  discussed.  Fix  a 
type  two  adversary,  say  pj.  Let  C  be  a  set  of  type  three  adversaries  in  this 
collection  of  computation  trees  (so  that  the  adversaries  in  C  choose  stopping 
times).  Notice  that  the  definition  of  C  depends  on  A  and  pj.  We  can  then 
construct  one  computation  tree  7a, c  for  each  A£  A  and  C  ^C.  For  a  fixed 
A  €  A,  the  computation  trees  Ta,c  look  identical  (essentially  just  like  Ta) 
for  2dl  choices  of  (7  €  C  except  that  we  put  C  into  the  environment  state  at 
each  point  in  Ta,c-  The  sample  space  assignment  maps  an  agent  p,  and 
a  point  c  of  a  tree  Ta,c  to  a  sample  space  5?^  C  3VeeJ_„  such  that  for  each 
run  r  €  TZ{Treel^,.),  exactly  one  point  (r,k)  €  Treel,.  is  in  5?^.  Intmtively, 
this  is  the  point  in  r  where  the  test  is  performed.  Note  that  if  we  consider 
two  adversaries  C,C'  qC  and  two  corresponding  points  c  and  cf  in  Ta,c  and 
Ta,c',  the  sample  spaces  Si,e  and  5,-, o'  used  by  at  these  two  points  will  in 
general  be  different:  at  c,  it  is  C  that  determines  at  which  point  in  each  run 
in  the  tree  that  pi  considers  possible  at  c  the  test  will  be  performed,  while 
at  d  it  is  C  that  makes  this  determination.  Notice  that,  in  the  presence  of 
this  third  type  of  adversary,  it  is  no  longer  the  case  that  all  sample  space 
assignments  defined  in  asynchronous  systems  are  standard  assignments  as 


they  are  in  synchronous  systems.  For  example,  it  no  longer  need  be  the  case 
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third  type  of  adversary.  To  make  this  precise,  once  we  fix  a  set  of  adversaries 
of  the  first  type  A  and  consider  the  resulting  system,  we  can  take  pts{A)  to 
be  the  set  of  all  possible  adversaries  of  the  third  type  in  this  system. 

Proposition  4.10:  [=  [=  every  fact  y?, 

agent  p,-,  and  point  c. 


The  proof  of  this  result  shows  that  can  be  understood  in  asynchronous 
systems  in  terms  of  an  adversary  that  chooses  as  the  time  for  the  test  to  be 
performed  the  worst  possible  time  from  pi’s  point  of  view.^® 

Of  course,  there  is  no  reason  to  assume  that  a  type  three  adversary  must 
either  be  restricted  to  choosing  horizontal  cuts  of  time  k  points  or  be  allowed 
to  choose  completely  arbitrary  cuts  of  points.  Other  intermediate  definitions 
seem  plausible  as  well.  One  can  imagine  a  partially  synchronous  model  in 
which  processors  cannot  tell  time  but  are  guaranteed  that,  for  every  As,  all 
processors  taJce  their  step  within  some  time  interv^ll  of  width  6.  It  would 
seem  reasonable  to  require  the  adversary  of  the  third  type,  rather  than  se¬ 
lecting  horizontal  time  k  cuts  or  totally  arbitrary  cuts,  to  select  cuts  with 
the  property  that  every  point  in  the  cut  is  a  time  k  point  for  some  k  falling 
in  some  interval  of  width  8.  We  can  also  generalize  the  notion  of  type  three 
adversary  slightly  so  as  not  to  require  that  it  choose  a  cut,  but  rather  have  it 
choose  at  most  one  point  per  run.  The  intuition  here  is  that  this  adversary 
simply  does  not  give  p;  the  chance  to  bet  in  certain  runs.  In  our  coin  tossing 
exeunple,  such  an  adversary  could  allow  Pi  to  bet  on  heads  only  when  the 
coin  has  landed  tails.  The  issue  of  defining  reasonable  adversaries  of  the  third 
type  deserves  further  study. 

We  close  this  section  with  a  comparison  of  our  definition  of  probability  in 
asynchronous  systems  with  that  of  [FZ88].  The  probability  assignment  used 
in  [FZ88]  in  the  asynchronous  setting  has  ^'Uch  the  same  flavor  as  that  of  our 
Rather  than  assuming  that  the  adversary  chooses  at  a  point  c  a  cut  of 
points  through  Treci^,  however,  Fischer  and  Zuck  assume  that  the  adversary 
chooses  a  cut  of  global  states  through  Treei^e',  that  is,  a  set  of  global  states 
appearing  in  Treci,c  with  the  property  that  no  two  global  states  lie  on  the  the 
same  run.  Intuitively,  this  means  that  if  the  adversary  performs  the  test  at 


Another  interpretation  of  this  result  is  that  the  language  obtained  by  closing  a  set 
of  fcrmulaG  under  the  standard  bcolcaxi  ccuucCbivcS  ttud  i.nc  mooai  operators  cannot 
distinguish  the  assignments  T*"*  and  We  note  that  the  richer  language  of  [FH88] 
can  distinguish  these  assignments. 
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one  point,  it  performs  the  test  at  all  other  points  with  the  same  global  state. 
This  seems  like  a  reasonable  restriction,  but  it  leads  to  some  unexpected 
consequences. 

Let  us  call  the  class  of  adversaries  considered  in  [FZ88]  state,  and  let  the 
corresponding  probability  assignment  be  P**“‘*.  Rather  than  giving  formal 
definitions  here,  we  give  an  example  to  show  how  'p'*®**  differs  from  P***'. 
Consider  a  system  in  which  pi  tosses  a  biased  coin  which  lands  heads  with 
probability  .99  and  tails  with  probability  .01.  The  system  consists  of  two 
runs  we  can  denote  by  h  and  t  and  four  points  corresponding  to  times  0 
and  1  in  runs  h  and  t.  The  computation  tree  has  only  three  nodes,  a  root 
a,  encoding  the  points  (fi,0)  and  (f,0),  a  node  6  corresponding  to  the  point 
{h,  1),  and  a  node  c  corresponding  to  {t,  1).  Suppose  P2  is  able  to  distinguish 
only  the  point  {h,  1)  from  the  remaining  three  points  and  suppose  that  <p  is 
the  fact  “the  coin  lands  heads”  (so  that  tp  is  true  at  {h,  0)  and  {h,  1),  and  false 
elsewhere).  Let  c  be  a  time  0  point,  say  (t,  0),  and  consider  the  probability 
with  which  p,-  knows  <p  with  respect  to  P****  and  P'*"**,  An  adversary  in 
pts  can  either  choose  {(/i,0),(t,0)}  or  {(h,0),(t,  1)}  as  the  set  of  points  to 
perform  the  experiment;  (p  is  true  with  probability  .99  with  respect  to  both 
sets.  It  follows  that  'P^*‘,c  |=  K^tp',  in  fact  we  have  P*’‘*,c  ^ 

Similarly,  an  adversary  in  state  can  choose  either  the  node  a  or  the  node  c  as 
a  state  at  which  to  perform  the  experiment,  since  these  are  the  cuts  of  global 
states  contained  in  {o,  c}.  The  choice  of  a  corresponds  to  the  adversary  in 
pts  that  chooses  {(fi,  0),  {t,  0)}.  However,  the  choice  of  c  does  not  correspond 
to  {(fi’)0),(t,  1)}.  In  fact,  there  is  no  adversary  in  state  corresponding  to 
this  adversary  in  pts,  since  it  would  amount  to  choosing  the  nodes  a  and  c, 
both  of  which  lie  on  the  same  run.  With  respect  to  the  choice  a,  p  holds 
with  probability  .99;  with  respect  to  the  choice  c,  p  holds  with  probability  0. 
Thus,  we  get  P*‘“‘',c  |=  p.  In  some  sense  it  seems  that  is  giving 
the  more  reasonable  answer  here.  Since  P2  knows  that,  a  priori,  the  coin 
will  land  heads  with  high  probability,  and  its  information  has  not  eliminated 
either  run,  it  should  still  consider  heads  extremely  probable.^^ 


^^Note  that  this  example  also  shows  that  the  adversaries  in  state  are  examples  of  the 
more  genereil  auveisaiies  discussed  above,  that  do  not  necessarily  CxtCcsc  one  point  psr 
run.  For  example,  the  adversary  choosing  the  global  state  c  does  not  choose  a  point  in  the 
run  h. 
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4.7  An  application:  coordinated  attack 


As  an  example  of  how  probabilistic  knowledge  can  be  used  to- analyze  pro¬ 
tocols,  and  of  how  heavily  statements  made  about  protocols  depend  on  the 
particular  definition  of  probabilistic  knowledge  used,  we  now  apply  the  dif¬ 
ferent  probability  assignments  defined  in  the  context  of  synchronous  systems 
to  understanding  probabilistic  coordinated  attack  as  defined  in  Section  4.3. 
In  [HM84]  it  is  shown  that  a  state  of  knowledge  called  common  knowledge  is 
a  necessary  condition  for  coordinated  attack.  Recall  that  a  formula  ip  is  com¬ 
mon  knowledge  if  all  agents  know  <p,  all  agents  know  all  agents  know  <p,  and 
so  on  ad  infinitum.  In  the  same  paper  it  is  shown  that  common  knowledge 
of  nontrivial  facts  cannot  be  attained  in  systems  where  there  is  no  upper 
bound  on  message  delivery  time  (and,  in  particular,  in  asynchronous  sys¬ 
tems),  and  hence  that  coordinated  attack  is  not  possible  in  such  systems. 
We  now  examine  the  relationship  between  probabilistic  common  knowledge 
and  probabilistic  coordinated  attack. 

Recall  from  Chapter  2  that  common  knowledge  is  defined  as  follows. 
Given  a  set  G  C  {pi,...,Pn}  of  agents,  we  define  everyone  in  G  knows 
(p  by  EaP  =  Apj€G  Ki<p.  Defining  Ej^<p  inductively  by  E^p  =  p  and  Ej^p  = 
EgEq~'^P^  we  define  p  is  common  knowledge  to  G  by  CaP  =  Ak>o^a’P' 
Recall  that  common  knowledge  satisfies  the  following  statements: 

1.  the  fixed  point  axiom:  Cap  =  Ea{p  A  Cap). 

2.  the  induction  rule:  From  ^  D  Ea{i>  A  p)  infer  -0  D  CgP- 


The  first  statement  says  that  CaP  is  a  fixed  point  of  the  equation  X  = 
Eo{p  A  X).  In  fact,  it  can  be  shown  to  follow  from  the  induction  rule  that 
Cop  is  the  greatest  fixed  point,  and  thus  is  implied  by  all  other  fixed  points 
of  this  equation  [HM85]. 

By  direct  analogy,  probabilistic  common  knowledge  is  defined  in  [FH88] 
as  the  greatest  fixed  point  of  the  equation  X  =  E%{p  A  X),  where  E%p  = 
ApieG  K“p.^^  It  is  easy  to  show  that  the  definition  of  C°p  satisfies  the 
obvious  analogues  of  the  fixed  point  axiom  and  induction  rule  given  above. 


-sM’v-Isst  to  the  infinite  ccnjnncticn  of 


is  shown  in  [FHSS],  this  dsfinition  is  not 

{Eg)^(p,  >  0;  however  it  is  equivalent  to  the  infinite  conjunction  of  k  >  0, 

where  we  define  inductively  by  unwinding  the  fixed  point  equation:  {F^Y^tp  =  tp 

and  {FSftp  =  A  V)- 
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Now  consider  the  probabilistic  attack  problem,  and  suppose  ip  is  the  fact 
"A  attacks  iff  B  attacks” .  In  the  original  coordinated  attack  problem,  since 
(f  is  true  at  all  points,  the  induction  rule  implies  Co<p  holds  at  all  points. 
Are  there  implementations  of  the  probabilistic  attack  problem  where  C“(p 
holds  at  all  points?  The  answer  depends  on  the  choice  of  probability  assign¬ 
ment.  Stronger  assignments  yield  stronger  notions  of  probabilistic  common 
knowledge  which  make  stronger  requirements  of  the  implementation. 

Consider  the  assignment  Here  the  opponent  offering  an  agent  a  bet 
knows  the  entire  global  state  at  every  point.  If  there  is  any  point  where 
the  attack  is  uncoordinated,  then  no  run  extending  this  point  can  satisfy 
p.  At  this  point  <p  holds  with  probability  0  (according  to  so  it  easily 

follows  that  C^(p  cannot  hold  at  all  points.  This  says  that  an  algorithm 
achieves  probabilistic  coordinated  attack  with  respect  to  V***  iff  it  achieves 
coordinated  attack.  Since  coordinated  attack  is  known  to  be  unattainable  in 
asynchronous  systems,  we  cannot  get  probabilistic  coordinated  attack  either 
with  respect  to  such  a  strong  opponent. 

Next  consider  the  assignment  Here  the  opponent  offering  the  bet 

has  precisely  the  same  knowledge  as  the  agent  itself.  Consequently,  if  it  is 
possible  to  reach  a  point  at  which  the  agent  can  determine  from  its  local  state 
that  no  run  extending  the  point  can  satisfy  tp,  the  agent  knows  (p  does  not 
hold,  and  hence  neither  does  C^p.  Consequently,  our  first  implementation 
CAi  of  the  probabilistic  attack  problem  does  not  have  the  property  that  C%p 
holds  at  all  points  (with  respect  to  but  our  second  implementation  CA^ 
does.  This  can  be  proved  by  first  observing  that  E%p  holds  at  all  points  (with 
respect  to  P»’®“)  and  hence  by  the  induction  rule  (taking  the  formula  ip  in 
the  rule  to  be  true),  so  does  C^p. 

Notice  that  with  respect  to  any  consistent  probability  a.ssignment,  if  at 
some  point  an  agent  in  G  knows  p  does  not  hold,  then  C^p  cannot  hold 
at  this  point  (since  C^p  implies  E“p  by  the  fixed  point  axiom,  while  Ki-'P 
implies  -'E^’P  for  all  i  G  G).  Consequently,  it  cannot  be  the  case  that  C^p 
holds  at  all  points  of  CAi  with  respect  to  any  consistent  assignment.  Is  it 
possible  for  C^p  to  hold  at  all  points  of  CAi  with  respect  to  any  probability 
assignment?  Since  this  algorithm  guarantees  p  holds  with  probability  o, 
taken  over  the  runs,  the  obvious  solution  is  to  make  the  assignment  mimic 
the  probability  distribution  on  the  runs.  In  particular,  consider  'p'”".  It  is 
easy  to  see  that  with  this  assignment,  every  agent  knows  p  with  probability 
a  at  all  points  of  the  system.  Since  E^p  holds  at  all  points,  it  follows  by  the 
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induction  rule  that  C“(p  holds  at  all  points  as  well. 

We  summarize  our  discussion  in  the  following  proposition. 

Proposition  4.11: 

1.  CAi  achieves  probabilistic  coordinated  attack  with  respect  to 
but  not  pro**. 

2.  CAi  achieves  probabilistic  coordinated  attack  with  respect  to  (and 
P'""")  but  not  V^'*. 

3.  A  protocol  achieves  probabilistic  coordinated  attack  with  respect  to 

iff  it  achieves  coordinated  attack,  and  hence  no  such  protocol  exists 
in  which  the  generals  actually  attack. 

This  proposition  shows  how  increasing  the  power  of  the  opponent  (moving 
down  in  the  lattice)  strengthens  the  kind  of  guarantees  that  can  be  made  for 
probabilistic  attack.  Note  that  all  of  the  probability  assignments  agree  at 
time  0,  and  the  probability  they  assign  to  a  set  of  points  is  identical  to  the 
probability  of  the  set  of  runs  going  through  those  points;  i.e.  if  c  is  a  time  0 
point  in  Ta  and  RaW)  is  the  set  of  runs  in  Ta  satisfying  a  fact  about  the 
run,  then 


HW.V))  =  1‘ZVreeiM)  = 

=  t‘lSiPr‘fUv))  = 

However,  at  later  times,  it  is  only  that  agrees  with  the  initial  probability 
on  runs.  Thus,  for  the  other  probability  assignments,  saying  that  (p  holds 
with  probability  greater  than  o  at  all  points  (r,  k)  in  Ta  according  to  pi  will 
generally  be  a  stronger  statement  th^ul  saying  it  holds  with  probability  a 
taken  over  the  runs  of  Ta. 

Of  course,  it  is  perfectly  conceivable  we  might  want  to  consider  probabil¬ 
ity  assignments  besides  those  that  we  have  discussed  above,  which  will  make 
yet  more  guwantees.  Considering  such  intermediate  assignments  might  be 
particularly  appropriate  in  protocols  where  security  is  a  major  considera¬ 
tion,  such  as  cryptographic  protocols.  There  it  becomes  quite  important  to 
consider  the  knowledge  of  the  agent  we  are  betting  against. 
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We  remark  that  a  slightly  different  definition  of  probabilistic  coordinated 
attack  is  considered  in  [FZ88]:  it  is  required  only  that  the  conditional  proba¬ 
bility  that  both  parties  attack  together,  given  that  one  of  the  parties  attacks, 
is  at  least  a.^®  It  is  then  shown  in  [FZ88]  that  this  form  of  probabilistic  co¬ 
ordinated  attack  corresponds  to  all  the  agents  having  average  belief  of  a 
that  the  attack  will  be  coordinated.  We  can  reinterpret  these  results  in  our 
language  as  showing  that  this  notion  of  coordinated  attack  is  equivalent  to 
probabiBstic  common  knowledge  with  respect  to  another  probability  assign¬ 
ment,  much  in  the  spirit  of  In  partictjdar,  the  probability  space  used 

by  [FZ88]  for  this  analysis  is  not  pp®**,  but  an  inconsistent  probability  assign¬ 
ment.  However,  it  should  be  noted  that  one  can  be  led  to  counterintuitive 
results  using  an  inconsistent  probability  assignment.  Consider  'P^'"  in  the 
context  o(  CAi.  Since  there  is  a  point  at  which  the  information  in  agent  A’s 
local  state  guarantees  the  attack  will  not  be  coordinated,  according  to 
both  and  Ka~"P  hold  at  this  point.  In  other  words,  the  choice  of  'p'"'*"’ 
has  the  effect  of  saying  that  at  a  point  an  agent  can  have  high  confidence  in 
a  fact  it  knows  to  be  false. 

The  preceding  discussion  raises  another  interesting  point.  While  it  is  typ¬ 
ically  the  case  that  computer  science  applications  consider  only  probabilities 
over  runs  (such  applications  typically  require  only  that  a  condition  P  hold 
throughout  a  large  fraction  of  the  runs,  which  corresponds  to  it  is 

not  clear  that  this  is  always  appropriate.  If  an  agent  running  a  probabilistic 
coordinated  attack  algorithm  that  is  guaranteed  to  work  with  high  proba¬ 
bility  over  the  runs  finds  itself  in  a  state  where  it  knows  that  the  attack 
will  not  be  coordinated,  then  it  seems  clear  that  it  should  not  proceed  with 
the  attack.  It  may  be  worth  reconsidering  a  number  of  dgorithms  to  see  if 
they  can  be  redesigned  to  give  stronger  guarantees.  This  may  be  particularly 
appropriate  in  the  context  of  zero- knowledge  protocols  [GMR89],  where  the 
current  definitions  allow  a  prover  to  continue  playing  against  a  verifier  even 
when  the  prover  knows  perfectly  well  that  it  has  already  leaked  information 
to  the  verifier,  and  may  continue  to  do  so.  Although  it  is  extremely  unlikely 
that  the  prover  will  find  itself  in  this  situation,  it  may  be  worth  trying  to 
redesign  the  protocol  to  deal  with  this  possibility.  While  adaptive  protocols. 


Although  it  is  not  cieai  uom  the  deuniiion  of  probabilistic  attack  giTcn  in  [FZS8]  over 
what  the  probability  is  being  taken,  the  results  given  clearly  assume  that  the  probability 
is  being  taken  over  the  runs. 
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where  processors  modify  their  actions  in  light  of  what  they  have  learned,  are 
common  in  the  control  theory  literature,  the  probabilistic  sdgorithms  that 
are  used  in  distributed  systems  typically  are  not  adaptive.  It  seems  that  a 
number  of  algorithms  can  be  converted  to  adaptive  algorithms  with  relatively 
little  overhead.  We  hope  to  study  this  issue  more  carefully  in  the  future. 


4.8  Conclusion 

We  have  provided  a  framework  for  capturing  knowledge  and  probability  in 
distributed  systems.  Our  framework  makes  it  clear  that  in  order  for  an 
agent  to  evaluate  the  probability  of  a  formula  y?  at  a  given  point,  we  need 
to  specify  the  adversary  (or,  more  accurately,  adversaries)  that  determines 
the  probability  space.  We  have  described  how  to  choose  the  appropriate 
probability  space  as  a  function  of  the  adversary,  making  no  assumptions 
about  the  strategy  the  adversary  is  following.  One  potentially  fruitful  line  of 
research  is  to  understand  how  our  results  are  effected  if  we  make  assumptions 
about  the  strategies  the  adversary  Pj  is  allowed  to  follow  (such  as  assuming 
that  Pj  is  trying  to  maximize  its  payoff). 

This  use  of  adversaries  may  help  clear  up  a  number  of  subtle  issues  in  the 
study  of  probability,  such  as  what  the  probability  that  a  coin  lands  heads 
is  after  the  coin  has  been  tossed.  In  addition,  our  approach  allows  us  to 
unify  the  different  approaches  to  probability  in  distributed  systems  that  have 
appeared  in  earlier  works.  Of  course,  what  needs  to  be  done  now  is  to  use 
these  definitions  to  analyze  probabilistic  (especially  cryptographic)  protocols! 


4. A  Proofs  of  results 


This  appendix  contains  the  proofs  of  all  results  claimed  in  the  chapter. 


Proposition  4.1;  If  5i,c  is  state-generated  and  satisfies  REQi,  then  5i,c 
satisfies  REQ2. 


Proof;  Given  a  global  state  g,  let  Gg  be  the  set  of  points  (r,  k)  with  r{k)  = 
Qf  and  let  Tvy  be  the  set  of  runs  through  g.  Sy  our  technical  assumptioii 
that  the  global  state  encodes  the  adversary,  each  global  state  is  contained  in 
precisely  one  computation  tree.  Thus,  Gg  and  TZg  are  contained  in  a  single 
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computation  tree,  and  Tig  =  R(Gg).  Since  S  is  state-generated,  Si,c  is  the 
union  of  a  collection  of  sets  of  the  form  Gg.  Since  satisfies  REQi,  it  is 
contained  in  a  single  computation  tree  7x  =  Xa)  ftA)}  ^d  since  a  single 
computation  tree  contains  at  most  a  countable  number  of  global  states,  5{,e 
is  a  countable  union  of  sets  of  the  form  Gg.  Thus,  TZ{Si,e)  is  the  countable 
union  of  sets  of  the  form  TZg  =  with  g  a  global  state  in  Ta-  By 

the  definition  of  Ta,  each  set  TZg  is  a  measurable  set  of  runs  with  positive 
measure,  and  hence  their  countable  union  TZ{Si,c)  must  also  be  a  measurable 
set  with  positive  measure.  It  follows  that  Si,c  satisfies  REQ^.  □ 

Proposition  4.2:  If  5i,c  satisfies  REQ\  and  REQ^,  then  Pi,c  is  a  probability 
space. 

Proof:  We  must  show  (see  [HadSO])  that  Xi^c  is  a  set  of  subsets  of  5i,c  in¬ 
cluding  i5<,c  that  is  closed  under  the  formation  of  complements  and  countable 
unions,  and  that  is  a  nonnegative,  countably  additive  function  on 
satisfying  =  0. 

Let  T(c)  =  {TZa,Xa,Ha)-  Since  Si,e  =  Proj{TlA,Si,c)  and  TZa  G  Xa,  we 
have  Si,e  6  Xi,e.  If  X  £  Xi,e,  then  X  =  Proj{R,  Si,e)  for  some  R£  Xa\  since 
Xa  is  closed  under  complementation,  iZ®  G  Xa  and  X®  =  Proj{B!^,  Si,d)  € 
Xi^i,  and  hence  Xi^c  is  closed  under  complementation.  If  X\,X2,...  is  a 
countable  collection  of  sets  from  A’i,c,  then  Xj  =  Proj{Rj,  Si,e)  for  some 
Rj  G  Xa  for  each  j.  Since  Xa  is  closed  under  countable  union,  R  =  OjRj  G 
Xa-  It  follows  that 

X  =  \JjXj  =  \JjProj{Rj,Si,c)  =  Proj{[JjRj,Si,c)  =  PToj{R,Si,c), 

so  X  G  Xi,c  and  Xi^c  is  closed  under  countable  union. 

Since  5,-,c  is  contained  in  a  single  computation  tree  by  REQi,  and  since 
R'iSifi)  G  Xa  and  >  0  by  REQ2i  conditional  probability  with 

respect  to  TZ{Si,c)  is  well-defined,  and  hence  /ii,c  is  well-defined.  Clearly,  /ii,c 
is  nonnegative  since  {Ia  is.  Furthermore,  ;t»,c(0)  =  pA{^)lpA{'H,{Si,e))  =  0. 
Finally,  suppose  Xi,X2, ...  is  a  countable  collection  of  pairwise-disjoint  sets 
in  Xi^c-  We  know  that  Xj  =  Proj{Rj,  Si,e)  for  some  Rj  G  Xa.  We  can  assume 
every  run  in  Rj  passes  through  5,,c,  or  we  can  replace  Rj  ’,vith  the  measurable 
set  TZ[Sifi)  n  Rj\  and  we  can  assume  the  Rj  are  pairwise  disjoint,  since  if  T 
is  contedned  in  both  Rj  and  Rk  then  some  point  on  r  is  contained  in  both 
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Xj  =  Proj{Rj,  Si,c)  and  Xk  =  Proj{Rk,  Si,e)i  contradicting  the  pairwise¬ 
disjointness  of  Xj  and  X^.  It  follows  from  the  pairwise-disjointness  of  the 
Rj  =  1Z{Xj)  that 

/.  I  V-  N  _  pAjT^iOjXj))  _ 

y  Mnsi^)) 

=  '£i‘UXj), 

3 

and  hence  is  countably  additive.  □ 


Proposition  4.3:  In  a  synchronous  system,  if  c?  is  a  consistent  standard 
assignment  and  iC($)  is  state-generated,  then  ^  is  measurable  with  respect 
to  S  for  all  facts  ip  6  >C($). 

Proof:  Recall  that  £($)  is  state-generated  if  all  the  primitive  propositions 
in  $  are  facts  about  the  global  state.  Recall  also  that  p  is  measurable  with 
respect  to  S  if  Si^d'p)  €  Xi,c  lor  all  agents  Pi  and  points  c.  Fix  an  agent  pi 
and  a  point  c.  Let  Sk  denote  the  set  of  time  k  points  in  the  computation  tree 
containing  c.  We  claim  it  is  enough  to  show  that 

(*)  R{Sk{<p))  is  a  measurable  set  of  runs  for  all  times  k  and  all  formulas 

(p  6 

To  see  this,  notice  that  since  5  is  a  consistent  assignment  in  a  synchronous 
system,  5i,c  contains  only  time  k  points  for  some  k.  Consequently,  we  have 
7^(<S'i,e(y’))  =  R{Si,c)  n  RiSkitp)).  Since  R(Si,e)  is  measurable  by  REQ2, 
condition  (*)  will  imply  7^(5;, c(v?))  is  measurable.  It  will  follow  that  Si,e{ip) 
is  a  meEisurable  subset  of  iSj.e. 

The  proof  of  (♦)  proceeds  by  induction  on  the  structure  of  If  97  is  a 
primitive  proposition  in  §,  then  since  jC($)  is  state-generated  we  know  that 
(p  must  be  a  fact  about  the  global  state.  Arguments  similar  to  those  used  for 
Proposition  4.1,  therefore,  suflice  to  show  that  7^(oa(^))  is  a  meeisurable  set 
of  runs.  The  cases  of  negation  and  conjunction  follow  immediately  from  the 
fact  that  measurable  sets  are  closed  under  negation  and  intersection.  Since 
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Kiip  is  a  fact  about  the  global  state,  the  aiguments  for  such  a  formula  is 
identical  to  the  argument  for  primitive  propositions  above. 

For  a  probability  formula  tjj  of  the  form  Pri{ip)  >  a,  since  we  consider  only 
uniform  sample  space  assignments,  it  is  easy  to  check  that  V*  is  true  at  either 
all  or  none  of  the  points  in  5,-,c;  hence  7^(<Si,e(V’))  must  be  measurable  since 
itself  is  guaranteed  to  be  measurable  by  REQ2.  Since  S  is  inclusive, 
we  know  that  d  €  Si,d  for  every  time  k  point  d.  Since  5  is  consistent,  we 
know  that  5,-,^  contains  only  time  k  points  from  T{d).  It  follows  that  Sk  is  the 
union  of  sets  of  time  k  points  of  the  form  Moreover,  since  S  is  uniform, 
the  Si4  actually  partition  Sk-  Finally,  since  each  5,-,^  is  state-generated  and 
since  there  are  at  most  a  countable  number  of  time  k  global  states  in  any 
given  tree,  we  see  that  Sk  is  partitioned  into  a  countable  collection  of  sets  of 
the  form  Si,d,  and  hence  the  same  is  true  for  Ski'fp)-  It  follows  that  7?.(S'fc(V’)) 
is  partitioned  into  a  countable  collection  of  sets  of  the  form  TZ{Si,d),  and  since 
the  sets  TZ{Si,d)  are  meeisurable,  so  is  their  countable  union  TZ{Sk{ip)). 

For  Q(p,  notice  that  (p  is  true  at  (r,  k+1)  iff  Qip  is  true  at  (r,  k).  It  follows 
that  'R-iSkiO'p))  =  Tl{Sk+i{<p)),  aiid  hence  by  the  inductive  hypothesis  for 
(p  that  'fi{Sk{0<p))  is  a  measurable  set  of  runs.  In  fact,  a  simple  extension 
of  this  argument  (by  induction  on  t)  shows  that  if  7^(5fc(¥>))  is  measurable 
then  so  is 

For  (pUi}),  define  (pUoijj  to  be  the  formula  ip,  and  define  tpUiip  for 
^  >  0  to  be  the  formula  <p  A  ...  A  ^  OV*  It  is  easy  to  see  that 

(pU Ip  is  true  at  a  point  d  HI  ipUiip  is  true  at  d  for  some  ^  >  0.  Thus, 
Sk{(p  U  Ip)  =  Uz>o  Sk{<P  Ut  Ip)  and  hence  Tl{Sk(}P  U  ip))  =  U/>o  Ut  ip)). 

Since  the  induction  hypothesis  holds  for  the  subformulas  (p  and  ip,  the  pre¬ 
ceding  paragraph  shows  that  each  set  R{Sk{<pUiip))  is  also  measurable,  and 
hence  so  is  their  countable  union  TZ{Sk{‘P  U  ip)).  □ 

Proposition  4.4:  If  S  and  S'  are  standard  assignments  satisfying  S'  <  S, 
then  for  every  agent  pi  and  point  c,  the  set  Si,c  can  be  partitioned  into  sets 
of  the  form  S-  d  with  d  6  5i,c- 

Proof:  Suppose  that  S  and  S'  are  standard  assignments  satisfying  S'  <  S. 
Since  S'  is  inclusive,  we  have  d  €  S'i  d  Q  Si,d  =  Si,e  for  every  d  6  Si,e,  and 
hence  Si,e  is  the  union  of  the  S'l^d  with  d  G  iS'i.c.  Furthermore,  since  S'  is 
uniform,  two  sets  Sid  and  Sl^  are  either  equal  or  disjoint,  and  hence  5,-,c  can 
be  partitioned  into  sets  of  the  form  S-  d  with  d  €  5i,e.  □ 
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Proposition  4.5:  In  a  synchronous  system,  ifV  and  V  are  consistent,  stan¬ 
dard  assignments  satisfying  V  <  V,  then  for  all  agents  ft,  all  points  c,  and 
all  measurable  subsets  S'  €  X! 

t}C 

(a)  S'  €  Xi,c  (so  that,  in  particular,  S-^  itself  is  a  measurable  subset  of 

5i,c). 


(b)  Rees';.,)  >  0. 

(c)  =  mAS'lsu  = 


Proof:  Fix  an  agent  Pi,  a  time  k  point  c  of  Ta  =  ('R’A)XA,fiA))  and  a  set 
€  XI,. 

(a)  Since  S'  6  X-  ^^  there  must  exist  some  subset  Tl'  6  Xa  such  that  S'  = 
Proj{'R,' ,  S-  ,).  Without  loss  of  generality,  we  can  assume  that  %'  C 
7^(5,'^)  (since  we  can  replace  Tt'  with  TV  n  7^(S',-^g),  which  must  zdso 
be  measurable  since  REQ2  guarantees  7^(5, 'c)  measurable).  Since 

SI,  C  Si,e  and  both  S^,  and  Si,e  consist  of  time  k  points  (since  P  and 
V  are  consistent  assignments),  we  have 


PT0i(K,Sl 


{(r',  k)  S  5;,.  :  r  S  K'}  =  {(/,  k)  S  Si..  :  r  e  K'} 

ProjiK'.Si^). 


Thus  S'  =  Proj{TZ.' y  Si,c),  which  shows  that  S'  is  a  measurable  subset 
of  Si,c. 


(b)  By  part  (a),  it  follows  that  S-^,  is  a  measurable  subset  of  5<,c.  Since 
we  have  restricted  to  standard  assignments  S',  we  know  that  S-^,  is 
state-generated,  and  arguments  simileir  to  the  proof  of  Proposition  4.1 
show  that  lJ>i,ciS'i^c)  >  0- 

(c)  Tracing  through  definitions,  we  see 

_  fiAins'))  f^Ams'))iPA(nsi,c)) 

^  PAinsi,))  fiA{nsi,c))/PA{nsi,c)) 

^  HejS') 

Pi,c{Si,c) 

□ 
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Proposition  4.6:  In  a  synchronous  system,  for  all  facts  (p,  all  agents  p,-, 
and  all  points  c,  the  rule  Bet{tp,  a)  is  7Veei,e*safe  for  pi  at  c  iff  Bet{<p,  a)  is 
JVceJ  g-safe  for  p,  at  c. 

Proof:  Since  the  sample  space  2Vcci,c  can  be  partitioned  into  the 

sample  spaces  Tree^^  j  with  d  6  Treci^ei  and  each  such  Treej  j  is  a  measurable 
subset  of  Tree{,e  by  Proposition  4.5.  The  law  of  conditional  expectation, 
therefore,  states  that 

where  the  summation  is  taken  over  all  sets  of  the  form  Treej  j  contained  in 
Treci^c-  Since  <  7?'®“,  we  can  use  part  (c)  of  Proposition  4.5  to  prove  that 
^TrttijWf\Treel^]  -  \Wf\,  and  hence  that 

Suppose  Bet{a,<p)  is  IVecJ^-safe  for  p,-  at  c.  Then  >  0  for  all 

points  d  agent  p,  considers  possible  at  c  and  all  /,  which  implies  ETrtei,t[Wf]  > 

0  for  all  points  e  agent  p{  considers  possible  at  c  and  all  /,  and  hence  that 
Bet{a,  (p)  is  JVee,-, e-safe  for  pi  at  c. 

Conversely,  suppose  Bet{a,  (p)  is  not  IVeeJ  g-safe  for  pi  at  c.  Then  E^^^  [IT/]  < 

0  for  some  point  d  agent  pi  considers  possible  at  c  and  some  /.  Let  f  be  the 

strategy  identical  to  /  on  Treej^i,  and  hence  on  Tree\^^,  but  offering  a  payoff 

of  1  everywhere  else.  If  pj  uses  strategy  f,  there  is  clearly  no  way  for  p;  to 

win  off  of  Treej^d  (the  best  p,  can  do  is  break  even),  so  that  E^^j  ^  0 

*** 

for  e  ^  d.  Moreover,  by  choice  of  d,  Ej^^_jWfi]  <  0.  It  follows  that 
i^Tree,.  JW/f]  <  0,  and  hence  that  Bet{a,(p)  is  not  2Vcei,e-safe  for  pi  at  c.  □ 


Theorem  4.7:  For  all  facts  tp  measurable  with  respect  to  V’,  all  agents  pi, 
and  all  points  c,  the  rule  Bet{<p,a)  is  safe  for  pi  at  c  iff  V’,c  |=  Kfp. 

Proof:  Consider  the  evaluation  of  Ec[Wf]  =  E^^  \Wsi}Pi°^)\  arbitrary 

«|C 

points  c  and  strategies  /.  Since  p,  has  the  same  local  state  at  all  points  of 
Treeing  and  /  is  a  function  of  Pj’s  local  state,  pj  offers  the  same  payoff  ^  for  a 
bet  on  v?  at  all  points  of  Treel^c-  Since  pi  is  following  Bet{p,  a)  at  all  points 
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of  Tree\  ,.,  agent  p,-  accepts  the  bet  at  all  points  of  Tree’i  ,.  or  rejects  the  bet  at 
all  such  points,  depending  on  whether  >  1/ a.  If  pi  rejects,  then  Ee[Wf]  is 
obviously  0.  K  p,-  accepts,  then  Pf’s  profit  is  /?  —  1  at  points  satisfying  (p  and 
—1  at  all  other  points,  and  hence  Ec[Wf]  =  (Notice  that 

because  tp  is  measurable  with  respect  to  we  are  guaranteed  that  Treei^Jjp) 
is  a  measurable  subset  of  Tree{^g,  and  hence  Tree\^Ji}p))  is  well-defined.) 

Suppose  1=  Kffp.  This  means  that  Mi,d(  ^  a  for  all 

points  d  agent  pf  considers  possible  at  c.  For  every  point  d  agent  p,-  considers 
possible  at  c  and  every  strategy  /  for  Pj,  therefore,  we  have  Ed[Wf]  >  0 
since  Pfi\J^Tree\Jjp))  —  1  >  (l/a)a  —  1  =  0  when  /3  >  1/a.  It  follows  that 
Bet{(p,  a)  is  safe  for  p;  at  c. 

Suppose  ^  K°p.  This  means  that  <  a  for  some 

point  d  agent  Pi  considers  possible  at  c.  Let  /  be  the  strategy  for  pj  offering 
a  payoff  of  1/a  for  a  bet  on  ^  at  all  points  pj  considers  possible  at  <2,  and 
hence  at  all  points  of  Tree\^^,  and  1  elsewhere.  It  follows  that  Ed[Wf]  < 
(l/a)a  —  1  =  0  for  the  given  strategy  /  and  the  given  point  d  agent  pj 
considers  possible  at  c,  and  hence  that  Bet{(p,  a)  is  not  safe  for  pj  at  c.  □ 

Theorem  4.8:  In  a  synchronous  system,  if  5  is  a  consistent  standard  as¬ 
signment,  then 

(a)  if  5  <S^,  then  S  determines  safe  bets  against  pj,  and 

(b)  if  S  determines  safe  bets  against  Pj  and  is  sufficiently  rich,  then 
S  <  S>. 

Proof:  Theorem  4.7  tells  us  that  determines  safe  bets;  from  Theo¬ 
rem  4.9(a)  (proved  below),  it  follows  that  if  «S  <  S’,  then  S  determines 
safe  bets  too.  This  proves  part  (a). 

To  prove  paxt  (b),  suppose  S  ^  S’,  which  means  Si,c  %  for 

some  agent  pi  and  point  c.  It  is  easy  to  construct  a  transition  proba¬ 
bility  assignment  r  inducing  a  distribution  p  on  the  runs  of  T  satisfying 
p{Tl{Si^c))  >  Jrcej  c)).  To  see  this,  notice  that  5'i,c  %  Tree’ic  implies 
d  €  i5i,c  and  d  ^  TVeCj  ,.  for  some  time  k  point  d  in  T;  and  if  Gd  is  the 
set  of  points  with  d’s  ^obal  state,  then  Gd  ^  Si,e  and  Gd  D  JVcCj,.  =  0 
since  S  and  S’  are  state-generated  (they  are  standard).  By  causing  r  to 
rissign  high  probabilities  to  the  edges  in  the  path  &om  the  root  of  T  to  d’s 
global  state  in  T,  we  can  guarantee  that  p(7^(Gd))  >  1/2.  This  gueirantees 
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that  fi('R,(S{,e))  >  fi(R(Gd))  >  1/2,  and  since  Gd  and  TVcej,.  are  disjoint, 
that  /i(7l(7Vee(J)  <  1  -/i(7l(Gd))  <  1/2;  so  JVeci, J)  as 

desired. 

Now  let  V  be  the  probability  assignment  induced  by  S  and  r,  and  let 
V  be  the  probability  assignment  induced  by  S'  and  r.  Furthermore,  let 
Ge  be  the  set  of  points  with  global  state  c,  let  ij)  be  the  fact  which  is  true 
precisely  of  the  points  in  Ge,  and  let  tp  =  -iij}.  Since  £($)  is  sufficiently  rich, 
it  follows  that  ^  G  $;  since  £($)  is  closed  under  negation,  it  follows  that 
=  ->V’  €  iC($). 

Since  both  S  and  are  steindard,  and  hence  both  inclusive  and  state¬ 
generated,  it  follows  that  Ge  Q  Si,e  H  Tree\^^.  Since  <p  is  false  only  at  points 
in  Ge,  and  since  Ge  is  contained  in  both  S'i.c  and  JVcej it  is  easy  to  see  that 


a  =  mASiAv)) 


MR(Si,.))-MR(Gc)) 

KnSi.c)) 


and 


/.(7t(2Ve4))-M7e(G.)) 


Furthermore,  since  S  is  uniform  (it  is  standard),  any  set  Si,e  not  equal  to  Si,e 
is  disjoint  from  Si,e  and  hence  from  Ge,  so  lii,e{Si,e{<p))  =  fti,eiSi,e)  =  1  for  all 
such  sets  5,-,e.  It  follows  that  |=  Kf‘(p. 

On  the  other  hand,  since  n{7i{Si,e))  >  since  /i(7^(Gc))  > 

0,  it  is  easy  to  see  that  a  >  /3.  Let  /  be  the  strategy  in  which  pj  offers  a 
payoff  of  1/a  on  Tretj^e,  and  suppose  pi  uses  the  rule  Bet{ip,a).  Clearly 
Wf  =  Wf{tp,a.)  is  1/a  —  1  on  2Veej  ,.(v?)  and  —1  off  this  set.  Thus, 


S{Wt)  =  (;  -  l)  (3  +  (-1)(1  -  ^)  <  (1  -  l)  a  -  (1  -  a)  =  0, 


which  means  Bei{<p,  a)  is  not  safe  for  p,  at  c. 


□ 


Note  that  the  universal  quantification  over  transition  probability  assign¬ 
ments  is  crucial  in  this  proof.  Given  a  fact  (p  false  only  at  points  in  the 
intersection  of  5,-,c  and  Treei  ,,,  the  proof  shows  that  a  necessary  condition  for 
7^,  c  1=  K^tp  to  imply  Bet{(p,  a)  is  safe  for  pi  at  c  is  that  the  measure  of  the 
runs  through  5',-,e  is  less  than  or  equal  to  the  measure  of  the  runs  through 


In  fact,  this  is  a  sufficient  condition  * :  well.  For  a  given  r  it  may 


be  possible  to  constract  a  set  Si,e  %  Tree{^e  satisfying  this  condition;  but  the 
only  way  to  satisfy  this  condition  for  all  r  is  to  take  Si,e  Q  Tree^^e' 
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Theorem  4.9:  In  a  synchronous  system,  if  V  and  V  are  consistent  standsurd 
assignments  satisfying  V  <  P,  then 

(a)  for  every  fact  tp,  every  agent  p,-,  every  point  c,  and  all  a,  with  0  < 
oi<  P  have 

V',  c  1=  implies  T’,  c  [= 


(b)  there  exist  a  fact  95,  an  agent  p,-,  a  point  c,  and  a,/3  with  0  <  a  <  /3  <  1 
such  that 

V,  c  ^  and  yet  V,c\= 

V,  c  and  yet  7>,  c  [= 

If  £($)  is  sufficiently  rich,  then  (p  €  >C($). 


Proof:  First  we  prove  part  (a).  Suppose  P',c  |=  Ki''^(p.  This  means  a  < 
PidS-SlM)  <  ti'i/{SlJ^(p))  <  /3  for  all  points  d  6  Ki{c).  Choose  d  €  Ki{c). 
Since  V  and  V  are  consistent  (and  uniform)  and  satisfy  V  <  P,  the  set  Si^d 
is  the  disjoint  union  of  a  collection  of  probability  spaces  S[  j^^ , . . . ,  iS,' ^  with 
dj  €  Si,d  C  Ki{c),  each  a  measurable  subset  of  5,*,^.  It  follows  that  Si,d{ip)  is 
the  disjoint  union  of  . . . ,  An  easy  computation  shows  that 

12} Pi,d,{3i,dj{<p))  ^  Pi,d,{Si,d{<p)).  Since  T>'  <  V,  Proposition  4.5  shows  that 
/i|- can  be  obtained  from  fii,d  by  conditioning  on  S-^dj'  follows  that 


(‘W.(SUW)  =  ■■  rcsi^M,r€Xi,.} 

=  sup{pJ,,.{7’'K.{S;^,)  :  T'CSi^M,  T'eXi^} 

=  sBp{pU,(r')  :  r  c  r  e  x!^} liUSU,) 

—  Pi,dj  J^ldf  (V^))  Pi,d(^ldj  )• 

Combining  the  preceding  statements,  we  have 


a  =  ^a/ii,d(£^ldj) 

} 

j 

<  P{.dJSi,d(<p)) 
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A  similar  argument  shows  <  p.  Since  these  arguments  hold  for 

all  d  €  Ki{c),  it  follows  that  V,c\= 

We  now  prove  part  (b).  Since  V  <  V,  it  follows  that  Si,c  contains  two 
distinct  sets  S[  ,.  and  S[^^  for  some  agent  p,-  and  points  c  and  d.  Let  ^  be  the 
fact  true  at  precisely  the  points  in  the  set  Gc  of  points  with  c’s  global  state, 
amd  let  (p  =  Notice  that  since  V  is  standard  and  hence  state-generated, 
Gc  is  contained  in  S^^  ami  disjoint  from  S-  j^.  If  £($)  is  sufficiently  rich,  then 
V*  €  $,  and  hence  ^  6 

Since  Gc  Q  C  5,-,c,  the  fact  (p  holds  with  probability  1  with  respect  to 
all  probability  spaces  determined  by  V  and  V  except  5,' ^  and  Si^c-  Since  V  < 
V,  Proposition  4.5  tells  us  that  can  be  obtained  from  pi,c  by  conditioning 
on  S-^^.  It  is  easy  to  see,  therefore,  that  p  holds  with  probability 


o'  =  /‘UKM)  = 


Pi,ci^i,c) 


with  respect  to  and  probability 


a  =  = 


Pi,e{Si,c)  Pi,e{Ge) 
Pi,e{Si,e) 


with  respect  to  5,-,e.  Since  pi,c(5^i,c)  <  Pi,ei^i,e)  =  1>  however,  it  is  easy  to  see 
that  a'  <  a.  It  follows  that  V,c\=  but  V',c  ^ 

On  the  other  hand,  -'p  holds  with  probability  0  with  respect  to  all  proba¬ 
bility  spaces  determined  by  V  and  V  except  5,',.  and  Si,c-  The  fact  -^p  holds 
with  probability  1  —  a'  with  respect  to  S[  c  and  probability  1  —  a  with  respect 
to  Si,c-  Since  a'  <  a,  we  have  1  —  a  <  1  —  a';  setting  /?  =  1  —  a,  it  follows 
that  V,c\=  Kf'^-^p  but  V,  c  ^  Kf^-^p.  □ 


Proposition  4.10;  V*'“\c  [=  K^^^p  iff  V^*',c  [=  K^^^p,  for  every  fact  p, 
agent  p,-,  and  point  c. 


Proof:  Consider  the  adversary  A€  pis  mapping  an  agent  pf  and  a  point  d 
to  the  set  of  points  defined  as  follows:  for  every  run  r  passing  through 
Trcc{  ,i,  there  is  a  point  (r,  k)  €  Tre^'i^d  satisfying  -^p  in  if  such  a  point 
exists,  and  an  arbitrary  point  (r,  k)  €  Treti^d  if  all  such  points  satisfy  p.  It  is 
easy  to  see  that  the  same  set  of  runs  pass  through  Sf,^  and  and  that  a 
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run  r  passes  through  iff  tp  is  true  at  all  points  of  r  contained  in  Treci^d- 

It  follows  that  Sf  j{(p)  and  Tree{^d{<p)  have  the  same  inner  measure.  On  the 
other  hand,  consider  an  arbitrary  adversary  B  E  pts  mapping  p,-  and  d  to 
the  set  Sf^d  (contained  in  2Vce|_j).  Suppose  the  run  r  passes  through  Sf^di’P)- 
It  follows  from  the  definition  of  that  tp  must  hold  at  every  point 

(r,  k)  €  JVeeJ  Since  5®^  must  contain  precisely  one  such  point,  r  must  pass 
through  Sfdiv)  as  well.  It  follows  that  the  inner  measure  of  Sfdif)  must  be 
at  least  the  inner  measure  of  Sf^dMi  hence  that  the  infimum  (taken  over 
all  adversaries  B  E  pis)  of  (/i®i).(5?^(v>))  is  precisely  (/i',y‘)*(3^«4<i(V’))-  ^ 
similar  construction  shows  that  the  supremum  (taken  over  all  adversaries  B  E 
pts)  of  (p®d)*(‘S'f(i(v’))  is  precisely  (/tf,a*)*(^®^,d(V’))"  Since  these  statements 
are  true  for  all  points  d  agent  pi  considers  possible  at  c,  we  have  |= 

iff  c  1=  □ 

4.B  Discussion 

In  this  appendix,  we  discuss  a  few  issues  related  to  observations  made  in  this 
chapter. 

4.B.1  The  need  for  protocols 

Although  from  a  computer  scientist’s  point  of  view,  it  seems  quite  natural 
to  assume,  as  we  do,  that  all  agents  in  a  system  follow  some  kind  of  a  proto¬ 
col,  protocols  are  not  quite  so  standard  in  the  probability  theory  literature. 
Interestingly,  Shafer  observes  [Sha85]  that  it  is  necessary  for  us  to  think  in 
terms  of  protocols  if  we  are  to  make  sense  of  “conditioning  on  everything  an 
agent  knows”  as  is  done  by  His  argument,  which  we  reproduce  here, 
is  based  on  Freund’s  puzzle  of  the  two  aces  (see  [Fre65];  other  references  are 
given  in  [Sha85]). 

Consider  a  deck  with  four  cards,  the  ace  and  deuce  of  hearts  and  spaces. 
After  a  fair  shuffle  of  the  deck,  two  cards  are  dealt  to  pi.  Now  what  is  the 
probability,  according  to  p2,  that  pi  holds  both  aces?  First,  notice  that  if  A, 
B,  C,  and  D  denote  the  events  that  pi  holds  two  aces,  at  least  one  ace,  the 


ac5  of  spaces,  and  the  ace  of  hearts,  respectively,  then 


Pt[A)  =  Pr{A  r\B)  =  Pt{A  H  (7)  =  i,  Pr{B)  =  Pr{C)  =  Pr{D)  =  i. 
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Suppose  Pi  first  says  it  holds  an  ace.  Conditioning  on  this  information, 
P2  computes  the  probability  pi  holds  both  aces  to  be 

As  a  result  of  learning  pi  holds  at  least  one  ace,  the  probability  according  to 
P2  that  Pi  holds  both  aces  increases. 

Suppose  Pi  then  says  it  holds  the  ace  of  spades.  Conditioning  on  this 
additional  information,  P2  computes  the  probability  pi  holds  both  aces  to  be 

As  a  result  of  learning  pi  holds  not  just  an  ace  of  spades  but  actually  holds 
the  ace  of  spades,  the  probability  according  to  P2  that  pi  holds  both  aces 
increases  even  more.  Similarly,  Pr{A\D)  =  1/3. 

But  is  this  second  computation  reasonable?  When  P2  learns  B,  then  P2 
knows  that  pi  has  either  the  ace  of  spades  or  the  ace  of  hearts.  When  P2  learns 
C,  then  P2  knows  that  pi  definitely  has  the  ace  of  spades.  Is  it  reasonable 
for  the  probability  p2  places  on  event  A,  that  pi  holds  two  aces,  to  increase 
from  1/5  to  1/3  simply  as  a  result  of  learning  which  of  the  two  aces  pi  has? 
It  seems  just  as  reasonable  to  argue  that  the  information  about  which  ace 
Pi  actually  has  is  useless,  and  p2’s  probability  of  A  shouldn’t  change  upon 
hearing  that  C  (or  D)  holds. 

As  Shafer  points  out,  the  right  way  for  P2  to  update  its  probability  of  A 
depends  on  what  protocol  the  agents  are  following.  If  the  agents  had  agreed 
Pi  would  first  reveal  whether  it  held  an  ace,  and  then  whether  it  held  the  ace 
of  spades,  then  the  increase  seems  reasonable:  if  pi  says  it  holds  an  ace,  then 
P2’s  learning  pi  does  not  hold  the  ace  of  spades  causes  p2’s  probability  that 
Pi  holds  both  aces  goes  down  to  0;  so  learning  that  pi  does  hold  the  ace  of 
spades  should  make  p2’s  probability  go  up.  On  the  other  hand,  if  the  agents 
were  following  a  protocol  whereby  pi  first  reveals  whether  it  has  an  ace,  and 
then,  if  it  does,  reveals  the  suit  of  one  of  the  aces  it  holds,  choosing  between 
hearts  and  spades  at  random  if  it  has  both  aces,  then  p2’s  probability  should 
not  change  as  a  result  of  hearing  that  pi  holds  the  ace  of  spades.^^  We  leave 
it  to  the  reader  to  construct  the  computation  trees  corresponding  to  the  two 

^'’'Although  Shafer  does  not  mention  this  point,  the  need  to  assume  that  pi  chooses 
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protocols  described  above,  and  to  check  that  using  'P'”'*,  we  do  indeed  get 
the  right  probabilities  in  each  case.  Again,  the  key  point  here  is  that  we  need 
the  protocol  to  be  completely  specified  in  order  to  appropriately  compute  the 
conditional  probabilities. 

4.B.2  Safe  bets  and  nonmeasurable  facts 

Recall  that  the  statement  of  Theorem  4.7  says  that  for  measurable  facts,  V* 
determines  safe  bets  against  pj.  The  condition  of  measurability  is  required  in 
order  for  the  use  of  expectation  in  the  definition  of  a  safe  bet  to  make  sense. 
Remember  that  Bct(^,0()  is  safe  for  Pi  at  c  if  Ed{Wf)  =  {Wf{(p,a)) 

IS  nonnegative  for  all  points  d  agent  pi  considers  possible  at  c,  and  for  all 
strategies  /  for  pj.  We  computed  in  the  proof  of  Theorem  4.7  that  Ed{Wf)  = 
- 1,  where  /?  is  the  payolF  offered  by  pj  in  Si,d  {Si,d  was  actually 
Treel  d).  In  order  for  lii,d{Si,d{<fi))  to  be  well  defined,  however,  Si,d(<p)  must 
be  a  mezisurable  subset  of  Si,d,  which  means  (p  must  be  measurable. 

In  fact.  Theorem  4.7  holds  for  nonmeasurable  facts  as  well,  but  we  must 
first  give  a  meaningful  definition  of  expectation  for  nonmeasurable  events. 
The  intuition  behind  the  inner  and  outer  measures  p,  and  p*  of  a  measure 
space  (5,  X,  fi)  is  that  fi*{S')  and  give  upper  and  lower  bounds  on  the 

probability  of  S']  if  S'  is  actually  a  measurable  set,  of  course,  these  bounds 
are  equal  to  the  actual  probability.  This  is  made  precise  by  a  classical  result 
[Hal50]  which  says  that  if  {S,  X',  u)  extends  {S,  X,  (i)  (in  that  X'  D  X  and  ft 
and  V  agree  on  X),  then  for  all  sets  X  €  X',  we  have  <  t'(A’)  < 

Moreover,  the  bounds  described  by  the  inner  and  outer  measure  are  actually 
attainable,  in  that  for  all  subsets  X  C  S,  there  is  a  probability  space  (S',  X',  v) 
extending  {S,X,fi)  such  that  X  E  X'  and  ^{X)  =  fi,{X)]  a  similar  result 
holds  in  the  case  of  outer  measure. 

We  want  to  extend  these  ideas  to  expected  value.  More  precisely,  we 
would  like  to  define  a  notions  of  inner  expected  value  and  outer-  expected 
value  for  a  “nonmeasurable”  random  variable  X  which  give,  respectively, 
lower  and  upper  bounds  on  what  should  be  the  expected  value  of  X  if  we 

between  hearts  and  spades  at  random  if  it  holds  both  aces  is  crucial  here.  For  example, 
suppose  P2  always  tells  pi  it  holds  the  ace  of  hearts  when  it  holds  both  aces.  In  this  case, 
Ps’s  probability  pi  holds  both  aces  should  decrease  to  0  when  pi  says  it  holds  the  ace  of 
spades. 
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were  to  extend  the  measure  space  as  above  to  make  X  measurable.  This 
requires  some  work  in  general,  but  in  the  special  case  where  X  takes  on 
only  two  values,  it  can  be  done  in  a  straightforwMd  way.  If  the  two  values 
taken  on  by  X  are  x  and  y,  with  x  >  y,  then  we  define  the  inner  and  outer 
expectations  of  a  random  variable  X  by 

E,{X)  =  xfi,{X  =  ®)  +  yfi*{X  =  y)  and 
E*{X)  =  xii*{X  =  i)  +  ytiJiX  -  y). 

It  is  not  hard  to  show  that  these  definitions  agree  with  the  expected  value 
if  the  set  X  =  X  is  measurable,  and  that  these  values  are  attainable  if  we 
extend  the  probability  space  in  the  right  way  to  make  X  =  x  measurable. 

Notice  that  the  random  variable  Wf  in  which  we  are  interested  in  fact 
takes  on  only  two  values  (depending  on  whether  ^  is  true  or  false).  Thus, 
applying  these  definitions,  we  get: 

MWf)  =  (/3  -  +  (-ly 

=  {13-  l)ii»{Si4{<p))  -  (1  -  fM,{Si,d{<p))) 

=  l3fi,{SiM)-l, 

which  looks  very  similar  to  the  formula  computed  for  measurable  facts.  Fol¬ 
lowing  the  last  two  paragraphs  of  the  proof  of  Theorem  4.7  using  this  formula, 
it  is  easy  to  see  the  rest  of  the  proof  holds,  and  hence  that  Theorem  4.7  is 
true  using  inner  expectation  in  place  of  expectation  in  the  definition  of  a  safe 
bet. 


Chapter  5 


A  Knowledge-Based  Analysis 
of  Zero  Knowledge 


In  this  chapter  we  study  the  relationship  between  knowledge  and  cryptogra¬ 
phy.  In  particular,  we  define  notions  of  knowledge  for  use  in  the  context  of 
cryptography,  and  analyze  interactive  and  zero  knowledge  proof  systems  in 
terms  of  these  notions  of  knowledge. 

5.1  Introduction 

Much  of  our  intuition  concerning  cryptography  depends  heavily  on  the  con¬ 
cept  of  knowledge.  For  example,  various  methods  of  encryption  [RSA78, 
GM84]  allow  two  agents  to  communicate  via  encrypted  messages  knowing 
that  other  polynomial-time  agents  will  know  little  or  nothing  about  the  con¬ 
tents  of  their  communication  (subject  to  certain  complexity-theoretic  as¬ 
sumptions).  Just  as  we  argue  informally  about  distributed  computation  in 
terms  of  the  knowledge  processors  have  about  their  environment,  the  same 
is  true  of  cryptography.  In  fact,  the  whole  point  of  cryptography  is  either 
to  transfer  knowledge  to  or  to  withhold  knowledge  from  various  agents  in 
a  system.  While  our  intuition  concerning  cryptography  depends  heavily  on 
knowledge,  researchers  have  yet  to  make  this  intuition  precise  in  terms  of 

This  chapter  is  joint  work  with  Joe  Halpern  and  Yoram  Moses.  An  earlier  version  of 
this  work  appeared  in  Proceedings  of  the  20th  ACM  Symposium  on  Theory  of  Computing 
[HMT88]. 
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formal  definitions  of  knowledge.  The  purpose  of  this  chapter  is  to  develop 
definitions  of  knowledge  that  we  hope  will  be  useful  in  the  general  construc¬ 
tion,  analysis,  and  understanding  of  cryptographic  protocols. 

When  developing  such  definitions,  it  is  helpful  to  keep  in  mind  concrete 
examples  of  cryptographic  protocols.  One  class  of  protocols,  the  class  of 
interactive  and  zero  knowledge  proof  systems  [GMR89],  has  received  a  great 
deal  of  attention  from  the  cryptographic  community.  Loosely  speaking,  an 
interactive  proof  is  a  conversation  between  an  infinitely  powerful  prover  and  a 
polynomial-time  verifier  in  which  the  prover  tries  to  convince  the  verifier  that 
a  certain  fact  (p  is  true,  typically  a  fact  of  the  form  x  E  L.  The  proof  consists 
of  a  sequence  of  rounds  in  which  the  verifier  asks  the  prover  a  question,  and 
the  prover  answers  the  question.  Loosely  speaking,  such  a  proof  is  said  to 
be  zero  knowledge  if  the  prover  does  not  leak  any  “knowledge”  to  the  verifier: 
that  is,  anything  the  verifier  knows  (or  knows  how  to  compute)  at  the  end  of 
the  proof  the  verifier  already  knows  at  the  beginning  of  the  proof  (with  the 
exception,  of  course,  of  the  fact  (p  being  proven). 

The  reason  these  protocols  have  received  so  much  attention  is  that  they 
seem  to  be  fundamental  building  blocks  in  the  construction  of  other  crypto¬ 
graphic  protocols.  To  see  why  this  is  true,  consider  two  agents  p  and  q  both 
of  whom  want  to  use  a  certain  resource  in  the  system,  and  suppose  they  agree 
to  flip  a  coin  to  determine  which  of  them  gets  to  use  the  resource  first.  Since 
neither  wants  the  other  to  be  able  to  influence  the  outcome  of  the  coin  in  its 
own  favor,  how  should  p  and  q  go  about  flipping  this  coin? 

One  such  coin  flipping  scheme  based  on  oblivious  transfer  is  given  by 
Rabin  in  [RabSl]  (see  also  [Blu,  FMR84]).  This  coin  flipping  scheme  consists 
of  four  steps: 

1.  Agent  p  first  selects  two  distinct,  odd  primes  and  sends  their  product 
n  to  agent  q. 

2.  Agent  q  then  selects  an  integer  x  at  random  from  the  group  Z*  of 
integers  between  1  and  n  relatively  prime  to  n,  and  sends  x^  to  p. 

It  is  not  hard  to  show,  since  n  is  the  product  of  two  distinct,  odd  primes,  that 
aj*  will  have  four  distinct  square  roots  of  the  form  x,  —x,  y,  and  —y.  Agent 
p  is  able  to  compute  these  square  roots  since  it  knows  the  factorization  of  n. 

3.  Agent  p  randomly  chooses  a  square  root  of  x^  and  sends  it  to  q. 


5.1.  INTRODUCTION 


155 


Given  one  of  x  or  —x  and  one  of  y  or  —y,  it  is  not  hard  to  show  that  the 
greatest  common  divisor  of  x  +  y  and  n  or  of  x  —  y  and  n  is  a  nontrivial 
divisor  of  n.  Agent  q  can  easily  compute  the  greatest  common  divisor  of  two 
numbers. 

4.  Agent  q  computes  a  nontrivial  divisor  of  n  and  sends  it  to  p. 

Agent  q  wins  the  coin  flip  iff  it  sends  a  nontrivial  divisor  of  n  to  p. 

Suppose  that  p  and  q  axe  honest  and  follow  this  protocol  exactly  (that  is, 
they  do  not  cheat  in  any  way).  In  this  case,  it  is  not  hard  to  convince  oneself 
that  agent  q  wins  the  coin  toss  with  probability  exactly  1/2:  roughly  speak¬ 
ing,  since  p  chooses  the  square  root  to  send  to  q  at  random,  with  probability 
1/2  agent  p  sends  either  y  or  —y,  in  which  case  q  can  compute  a  divisor  of 
n  and  win  the  coin  toss;  and  with  probability  1/2  agent  p  sends  either  x  or 
— X,  in  which  case  q  gains  no  new  information  to  help  it  compute  a  divisor 
of  n  and  presumably  loses  the  coin  toss.  If  p  or  q  cheat  in  some  way  during 
the  protocol,  however,  then  it  is  possible  for  q  to  win  the  coin  toss  with  some 
probability  other  than  1/2.  The  protocol  depends,  for  example,  on  the  fact 
that  the  integer  n  constructed  by  p  is  really  the  product  of  two  distinct,  odd 
primes  as  required.  Since  it  seems  possible  p  could  construct  an  n  not  of 
this  form  that  would  skew  the  outcome  of  the  coin  flip  in  p’s  favor,  q  should 
demand  to  be  convinced  that  n  is  of  the  correct  form  before  continuing  with 
the  coin  flip.  On  the  other  hand,  p  does  not  want  q  to  know  any  more  about 
the  factorization  of  n  after  being  convinced  n  is  of  the  right  form,  since  this 
could  skew  the  outcome  of  the  coin  flip  in  q’s  favor.  If  q  can  compute  one  of 
the  prime  factors  after  being  convinced  n  is  of  the  correct  form,  for  example, 
then  q  can  always  win  the  coin  toss.  What  we  need  here  is  a  way  for  p  to 
convince  q  that  n  is  of  the  right  form  without  giving  q  any  additional  infor¬ 
mation  about  71,  and  this  is  precisely  what  zero  knowledge  proof  systems  are 
designed  to  do.^ 

Because  interactive  and  zero  knowledge  proof  systems  serve  2is  building 
blocks  in  the  design  of  cryptographic  protocols,  and  because  the  concept  of 
knowledge  is  so  fundamental  to  our  understanding  of  these  proof  systems, 
we  choose  to  begin  our  study  of  knowledge  and  cryptography  with  interac¬ 
tive  and  zero  knowledge  proof  systems.  In  this  work,  we  will  concentrate 

^As  shown  in  [FMR84],  zero  knowledge  proois  can  also  be  used  to  avoid  problems 
arising  when  q  tries  to  cheat. 
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on  developing  definitions  of  knowledge  that  let  us  formalize  our  intuition 
concerning  such  proof  systems.  The  notions  of  knowledge  most  appropriate 
in  this  context,  however,  are  far  more  subtle  than  the  standard  notions  of 
knowledge  used  so  often  in  the  analysis  of  distributed  computation  (and,  in 
particular,  the  notions  defined  in  Chapter  2).  Since  cryptographic  protocols 
are  typically  probabilistic  protocols  that  guarantee  only  that  correctness  con¬ 
ditions  are  satisfied  with  high  probability,  definitions  of  knowledge  such  as 
probabilistic  knowledge  discussed  in  Chapter  4  that  incorporate  knowledge 
and  probability  will  almost  certainly  be  useful.  More  perplexing,  however,  is 
the  fact  that  the  computational  power  of  agents  in  cryptographic  systems  is 
typically  assumed  to  be  restricted  to  polynomial- time.  Recall  that,  accord¬ 
ing  to  the  standard  information-theoretic  definition  of  knowledge,  an  agent  is 
said  to  know  all  facts  that  follow  from  its  loczd  state,  regardless  of  the  com¬ 
putational  complexity  of  determining  that  these  facts  hold.  In  the  context  of 
cryptography,  however,  the  computational  intractability  of  a  problem  is  used 
to  keep  secret  certain  pieces  information.  Cryptography  is  concerned  with 
what  an  agent  can  compute  that  it  knows  in  polynomial  time,  and  cryp¬ 
tographic  protocols  typically  make  guarantees  such  as  no  polynomial-time 
agent  knows  any  more  after  eavesdropping  on  a  conversation  between  two 
other  agents  than  it  did  beforehand.  In  this  context,  the  standard  definition 
of  knowledge  is  clearly  inappropriate. 

Our  fundamental  contribution  is  the  definition  of  practical  knowledge, 
which  incorporates  knowledge  and  probability  with  restrictions  on  agents’ 
computational  powers.  This  definition  is  based  on  the  definition  of  resource- 
bounded  knowledge  given  in  [Mos88],  which  defines  knowledge  in  terms  of 
polynomial-time  tests  an  agent  can  use  to  determine  whether  it  knows  a  fact. 
Using  the  definition  of  practical  knowledge,  we  characterize  interactive  proof 
systems  in  terms  of  a  formal  statement  about  knowledge.  This  statement 
essentially  says  “at  the  end  of  a  proof  of  a:  €  i,  the  verifier  knows  x  € 
L,”  which  is  precisely  what  our  intuition  demands  of  an  interactive  proof 
system.  Furthermore,  using  the  definition  of  practical  knowledge,  we  state 
a  property  of  zero  knowledge  we  call  knowledge  security,  and  prove  that  any 
zero  knowledge  proof  system  satisfies  this  property.  Loosely  speaking,  this 
property  says  “the  prover  in  a  zero  knowledge  proof  oi  x  G  L  knows,  with 


high  probability,  that  if  the  verifier  knows  a  fact  ip  at  the  end  of  the  proof. 


then  the  verifier  already  knows  x  G  L  D  tp  at  the  beginning  of  the  proof.” 


This  captures  our  intuition  that  a  zero  knowledge  proof  does  not  “leak” 
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knowledge  of  any  fact  other  than  facts  following  from  x  &  L,  the  fact  the 
prover  initially  set  out  to  prove. 

Related  to  the  concept  of  knowing  a  fact  is  knowing  how  to  do  something 
(how  to  perform  a  given  operation).  There  is  a  difference,  for  example, 
between  knowing  the  fact  that  an  integer  is  composite  and  knowing  how  to 
generate  a  prime  factor  of  the  integer.  Zero  knowledge  proofs  are  intended  not 
to  leak  any  knowledge  of  this  kind  as  well  as  any  knowledge  of  facts.  While 
this  concept  of  “knowing  how”  has  also  been  of  great  interest  in  philosophy 
and  AI  (see  [Moo85]),  standsird  notions  of  knowledge  do  not  capture  this 
aspect  of  knowledge.  We  define  a  notion  of  knowing  how  to  generate  a  y 
satisfying  a  relation  R{x,y),  again  incorporating  knowledge  and  probability 
with  bounds  on  agents’  computational  resources.  In  the  context  of  a  proof 
of  a  €  Zr,  for  example,  we  might  take  the  relation  R{x,  y)  to  mean  “y  is  a 
prime  factor  of  x.”  With  this  definition,  we  can  again  state  a  property  of  zero 
knowledge  proof  systems  we  call  generation  security,  and  prove  that  any  zero 
knowledge  proof  system  satisfies  this  property.  This  property  essentially  says 
“the  prover  in  a  zero  knowledge  proof  olx  €.  L  knows,  with  high  probability, 
that  if  the  verifier  knows  how  to  generate  a  y  satisfying  R{x,  y)  at  the  end 
of  the  proof,  then  the  verifier  knows  how  to  do  so  at  the  beginning  of  the 
proof.”  This  captures  our  intuition  that  during  a  zero  knowledge  proof  the 
prover  does  not  “leak”  to  the  verifier  any  knowledge  of  how  to  do  anything, 
let  alone  any  knowledge  of  facts. 

We  find  it  interesting  that,  while  these  two  properties  (knowledge  and 
generation  security)  capture  everything  the  popular  intuition  says  we  want 
from  zero  knowledge  proof  systems,  we  are  unable  to  prove  that  any  proof 
system  satisfying  these  properties  is  zero  knowledge.  This  raises  the  inter¬ 
esting  question  of  whether  the  cryptographic  definition  of  a  zero  knowledge 
proof  system  is  one  of  several  possible  implementations  of  what  we  should  be 
calling  zero  knowledge,  or  whether  there  is  some  crucial  aspect  of  this  clever 
definition  of  zero  knowledge  the  populu  intuition  is  missing. 

Other  questions  about  zero  knowledge  proof  systems  also  arise  in  this 
framework.  For  example,  recall  that  interactive  and  zero  knowledge  proof 
systems  are  defined  in  the  context  of  infinitely  powerful  provers,  but  only 
polynomial-time  verifiers.  In  practice,  however,  both  the  prover  and  the  veri¬ 
fier  are  polynomial-time  agents.  Although  most  of  the  proof  systems  defined 
in  the  context  of  infinitely  powerful  provers  can  be  followed  by  polynomial¬ 
time  provers  if  these  weak  provers  are  supplied  with  some  secret  information 
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(such  as  the  factorization  of  n  in  the  coin  flipping  exsunple  above),  an  in¬ 
teresting  question  to  ask  is  whether  any  properties  of  these  proof  systems 
change  as  a  result  of  the  fact  that  the  prover  is  a  polynomial-time  agent  and 
not  infinitely  powerful.  For  example,  suppose  we  are  given  an  interactive 
proof  system  for  membership  in  a  language  L  defined  in  the  context  of  in¬ 
finitely  powerful  provers,  and  suppose  we  run  this  protocol  in  the  context  of 
weak  provers.  Is  this  protocol  stiU  a  proof  system  for  membership  in  i,  or 
does  it  actually  prove  more  or  less  than  simple  membership  in  LI 

In  order  to  answer  such  questions,  we  define  weak  interactive  proof  sys~ 
terns  in  which  the  prover  (as  well  as  the  verifier)  is  restricted  to  probabilistic, 
polynomial-time  computation.  We  prove  that  if  L  has  a  weak  interactive 
proof  system,  then  L  must  be  contained  in  BPP  (and  hence  that  the  ver¬ 
ifier  can  determine  whether  ®  €  L  on  its  own  without  even  consulting  the 
prover).  Since  the  interesting  languages  having  proof  systems  in  the  context 
of  infinitely  powerful  provers  are  not  known  to  be  contained  in  BPP  (see 
[GMR89,  GMW86)),  these  proof  systems  must  prove  more  to  the  verifier 
than  simple  language  membership  when  run  by  polynomial-time  provers.  In 
fact,  we  can  prove  in  a  precise  sense  that  such  proof  systems  must  actually 
be  proofs  about  the  prover’s  knowledge.  Furthermore,  we  show  that,  under 
natural  conditions,  the  notions  of  interactive  proofs  of  knowledge  defined  in 
(FFS87]  and  [TW87]  Me  instances  of  such  weak  interactive  proofs  of  knowl¬ 
edge.  In  this  framework,  using  the  language  of  knowledge,  we  can  make 
precise  several  differences  between  these  two  notions  of  proofs  of  knowledge. 
Finally,  we  show  that  zero  knowledge  weak  interactive  proofs  guarantee  the 
same  type  of  security  with  respect  to  the  facts  they  prove  as  zero  knowledge 
interactive  proofs  guarantee  with  respect  to  language  membership. 

We  believe  that  our  analysis  provides  a  great  deal  of  insight  into  (and 
support  for)  the  definitions  in  [GMR89]  and  their  extensions  to  the  case  of 
proofs  about  knowledge  in  [FFS87,  TW87].  None  of  our  technical  results 
about  the  definitions  themselves  is  very  deep;  the  difficulty  was  in  coming  up 
with  the  right  notions  of  knowledge  to  use  when  thinking  about  them.  While 
the  definitions  of  knowledge  we  give  here  are  motivated  by  interactive  and 
zero  knowledge  proof  systems,  we  believe  they  are  potentially  useful  when 
tbi>;Mng  about  cryptographic  protocols  in  general.  We  note  that  Fischer  and 


Zuck  [FZ87j  also  consider  notions  of  knowledge  (closely  related  to  our  notion 


of  knowing  how  to  generate)  for  use  in  the  context  of  interactive  and  zero 


knowledge  proof  systems,  and  use  their  definitions  of  knowledge  to  analyze 
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an  interactive  proof  of  quadratic  residuosity.  We  believe  that  thinking  about 
interactive  and  zero  knowledge  proof  systems  (and  cryptography  in  general) 
in  terms  of  knowledge  provides  a  good  framework  within  which  to  think 
about  cr3q}tographic  definitions  and  their  appropriateness. 

The  rest  of  the  chapter  is  organized  as  follows.  In  the  next  section,  Section 
5.2,  we  give  the  cryptographic  definitions  of  interactive  and  zero  knowledge 
proof  systems.  In  Section  5.3,  we  show  how  these  definitions  motivate  the  the 
definition  of  practical  knowledge.  In  the  following  Sections  5.4  and  5.5,  we 
show  how  practical  knowledge  can  be  used  to  characterize  interactive  proof 
systems  in  terms  of  knowledge,  and  how  practical  knowledge  can  be  used  to 
make  precise  the  intuition  that  the  verifier  in  a  zero  knowledge  proof  does 
not  know  any  more  at  the  end  of  the  proof  them  it  did  at  the  beginning.  In 
Section  5.6  we  define  the  notion  of  “knowing  how,”  and  show  that,  in  a  precise 
sense,  the  verifier  cannot  do  any  more  at  the  end  of  a  zero  knowledge  proof 
than  it  could  at  the  beginning.  Section  5.7  introduces  weak  interactive  proofs, 
relates  them  to  the  proofs  of  knowledge  of  [FFS87,  TW87],  and  proves  that 
zero  knowledge  weak  interactive  proofs  are  secure  in  the  senses  defined  above. 
Finally,  in  Section  5.8,  having  characterized  the  definition  of  an  interactive 
proof  system  in  terms  of  knowledge,  we  sketch  an  example  of  how  we  can 
use  this  characterization  to  reason  about  interactive  proof  systems.  More 
precisely,  we  prove  the  familiar  result  that  the  sequential  composition  of 
two  interactive  proofs  is  itself  ;in  interactive  proof.  The  chapter  ends  with 
Appendix  5. A,  in  which  we  give  the  proofs  of  the  results  claimed  in  this 
chapter. 


5.2  Interactive  and  Zero  Knowledge 
Proof  Systems 

We  begin  with  the  formal  cryptographic  definitions  of  interactive  and  zero 
knowledge  proof  systems,  and  a  few  informal  examples. 


5.2.1  Interactive  protocols 

Recall  that,  loosely  speaking,  an  interactive  proof  is  a  conversation  between 
a  prover  and  a  verifier  in  which  the  prover  tries  to  convince  the  verifier  that 
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a  certain  fact  is  true.  This  idea  of  a  conversation  between  two  agents  is  made 
precise  by  the  definition  of  an  interactive  pr  jtocol. 

Formally,  an  interactive  protocol  [GMR89]  is  an  ordered  pair  {PyV)  of 
probabilistic  Turing  machines,  where  P  and  V  axe  intuitively  descriptions  of 
the  protocols  to  be  followed  by  the  prover  p  and  the  verifier  v,  respectively. 
The  Turing  machines  P  and  V  share  a  read-only  input  tape]  each  has  a  private 
one-way,  read-only  random  tape]  each  has  a  private  work  tape]  and  P  and  V 
share  a  pair  of  one-way  communication  tapes,  one  from  P  to  V  being  write- 
only  for  P  and  read-only  for  V,  and  the  other  from  V  to  P  being  write-only 
for  V  and  read-only  for  P. 

A  run  of  the  protocol  (P,  V)  proceeds  as  follows.  Initially,  the  common 
input  tape  is  initialized  with  some  string  x,  the  two  random  tapes  are  ini¬ 
tialized  with  infinite  strings  of  independent,  random  bits,  the  two  work  tapes 
are  initialized  with  strings  s  and  tj^  and  the  two  communication  tapes  are 
blank.^  The  remainder  of  the  run  consists  of  a  sequence  of  rounds.  During 
any  given  round,  V  first  performs  some  internal  computation  making  use  of 
its  work  tape  and  other  readable  tapes,  and  then  sends  a  message  to  P  by 
writing  on  F’s  write-only  commumcation  tape  (which  is  P’s  read-only  tape); 
P  then  performs  a  similar  computation.  It  is  not  hard  to  see,  for  example, 
that  we  can  view  the  coin  flipping  example  given  in  the  introduction  as  a 
two-round  interactive  protocol. 

At  any  time  during  a  run  of  an  interactive  protocol  (P,  V),  either  P  ox  V 
can  halt  the  interaction  by  entering  a  halt  state.  V  cam  accept  or  reject  an 
interaction  by  entering  an  accepting  or  rejecting  halt  state,  respectively,  in 
which  case  we  refer  to  the  resulting  run  as  either  an  accepting  or  rejecting 
run.  The  running  time  of  P  or  V  during  a  run  of  (P,  F)  is  the  total  number 
of  steps  taken  by  P  or  V,  respectively,  during  the  run.  We  assume  that  V  is 
a  probabilistic  Turing  machine  running  in  time  polynomial  in  |x|,  and  hence 
that  it  can  perform  only  probabilistic,  polynomial-time  computations  during 
each  round,  and  participate  in  only  a  polynomial  number  of  rounds.  Conse¬ 
quently,  we  cEin  assume  that  V  always  halts  the  interaction  eifter  a  polynomial 

*The  the  need  for  allowing  initial  values  on  the  work  tapes  was  first  observed  in  [Ore87, 
TW87];  we  will  return  to  this  issue  when  we  define  sero  knowledge  in  Section  5.2.3  and 
weak  interactive  proof  systems  in  Section  5.7. 

‘  Actually,  since  we  want  to  ran  interactive  protocols  as  subroutines  of  other  protocols, 
it  is  enough  to  assume  the  unread  cells  on  the  communication  tapes,  the  cells  to  the  right 
to  the  tape  heads,  are  blank. 
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number  of  rounds,  eind  always  enters  either  m  accepting  or  rejecting  state. 
We  will  make  no  assumption  about  the  running  time  of  P  for  the  moment, 
although  in  Section  5.7  (when  we  consider  weak  provers)  we  will  assume  that 
P  runs  in  probabilistic,  polynomial-time  as  well. 

In  terms  of  the  model  of  computation  defined  in  Chapter  2,  the  system 
corresponding  to  the  interactive  protocol  (P,  F)  consists  of  two  agents,  the 
prover  p  and  the  verifier  v.  Notice  that  we  distinguish  the  agents  p  and  v 
from  the  protocols  P  and  V  they  follow.  A  run  of  this  interactive  protocol  is 
am  infinite  sequence  of  global  states,  where  each  global  state  consists  of  one 
local  state  for  the  prover  p  and  one  for  the  verifier  v.  Agent  p’s  local  state  is  a 
tuple  consisting  of  a  description  of  the  Turing  machine  P,  the  current  round 
number  (an  interactive  protocol  is  a  synchronous  protocol),  the  contents  of 
the  input  tape,  the  finite  prefix  of  its  random  tape  read  up  to  this  point,  the 
contents  of  its  work  tape,  the  contents  of  the  two  communication  tapes,  and 
the  position  of  the  tape  heads  on  each  of  these  tapes;  agent  v’s  local  state 
is  defined  in  a  similar  fashion.  We  assume  for  the  sake  of  convenience  that 
prover  and  verifier  each  encode  their  complete  history  on  their  work  tapes. 
Since  we  think  of  the  prover  and  verifier  as  alternating  steps,  we  think  of 
the  verifier  as  being  active  at  even  times,  and  the  prover  being  active  at 
odd  times.  It  is  not  hard  to  see  that  the  protocols  described  by  the  Turing 
machines  P  and  V  can  be  captured  in  terms  of  the  definition  of  a  protocol 
given  in  Chapter  2.  We  denote  the  system  consisting  of  all  possible  runs 
of  (P,  V)  by  P  X  V.  The  following  systems  will  also  be  useful  later  in  this 
chapter:  P  x  V”*,  the  system  consisting  of  the  union  of  the  systems  P  xV* 
for  all  probabilistic,  polynomial-time  V*;  VxV,  the  system  consisting  of  the 
union  of  the  systems  P*  x  V  for  all  Turing  machines  P*;  and  X  V,  the 
system  consisting  of  the  union  of  the  systems  P*  xV  for  aJl  probabilistic, 
polynomial- time  P*. 


5.2.2  Interactive  proof  systems 

The  next  step  in  the  definition  of  a  zero  knowledge  proof  system  is  to  define 
what  it  means  for  an  interactive  protocol  to  be  a  proof  system.  Loosely 
speaking,  an  interactive  protocol  is  a  proof  system  for  a  language  L  if  the 
verifier  accepts  the  comiuoii  input  v  with  xugu  prooaoiiiry  wnen  sc  G  1/  and. 
rejects  with  high  probability  when  x  ^  L. 

Given  an  interactive  protocol  (P,  V),  we  denote  by  (P(s),  V’(t))(x)  the 
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random  variable  assuming  as  values  runs  of  the  protocol  (P,  V)  in  which  the 
input  tape  is  initialized  with  «  eind  the  prover  and  verifier  work  tapes  are 
initialized  with  s  and  L  More  precisely,  we  denote  by  (P(a),  V^(t))(®)  the 
random  variable  mapping  a  sequence  p  of  coin  flips  to  the  run  of  (P,  V)  in 
which  the  common  input  tape  is  initialized  with  x,  the  prover’s  work  tape 
with  s,  and  the  verifier’s  work  tape  with  t,  and  p  is  the  sequence  of  coins 
flipped  by  the  prover  and  verifier  during  this  run/  We  write  *(P(«),  F(t))(x) 
accepts'  to  denote  the  fact  that  the  run  assumed  by  (P(a),  V’(t))(x)  as  a  value 
is  an  accepting  run.  An  interactive  protocol  (P,  V)  is  said  to  be  an  interactive 
proof  system  for  a  language  L  if  the  following  conditions  are  satisfied: 

•  Completeness:  For  every  fc  >  1  and  sufficiently  large  x,  and  for  every  s 
and  t, 


iix  £  L,  then  pr[(P(s), F(t))(x)  accepts]  >  1  —  Ixp* . 

•  Soundness:  For  every  A:  >  1  and  sufficiently  large  x,  for  every  P*,  and 
for  every  s  and  t, 

Hz  ^  L,  then  pr[(P*(s),  V’(t))(x)  accepts]  <  |x|”* . 

We  use  “sufficiently  large  x”  as  a  shorthand  for  “there  exists  JV*  >  1  such 
that  for  every  x  satisfying  jxj  >  Nk”  The  subscript  k  in  Nk  reflects  the  fact 
that  the  notion  of  “sufficiently  large”  depends  on  the  size  of  k.  Without  loss 
of  generality,  we  can  always  assume  that  the  same  value  Nk  is  used  in  both 
the  soundness  and  completeness  conditions. 

We  refer  to  p  as  the  “good  prover”  when  it  is  running  P,  and  to  v  as 
the  “good  verifier”  when  it  is  running  V.  The  completeness  condition  is  a 
guarantee  to  both  the  good  prover  and  the  good  verifier  that  Hz  E  L,  then 
with  overwhelming  probability  the  good  prover  will  be  able  to  convince  the 
good  verifier  that  z  E  L.  The  soundness  condition  is  a  guarantee  to  the 
good  v'^rifier  that  if  x  ^  X,  then  the  probability  that  an  arbitrary  (possibly 
malicious)  prover  P*  is  able  to  convince  the  good  verifier  that  x  6  X  is  very 

^We  sometimes  refer  to  a  run  assumed  as  a  value  by  (P(«),  V(t))(x)  as  “a  run  of  (P,  V) 
on  input  x  with  a  and  t”  We  often  abuse  notation  and  use  (P(4),  V’(t))(z)  to  denote  an 
arbitrary  such  run,  or  even  the  set  of  ail  such  runs.  The  meaning  will  always  be  clear  £com 
context. 
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low.  Intuiti^'ely,  therefore,  the  verifier  “knows”  that  x  ^  L  when  it  accepts, 
since  the  chance  of  accepting  when  x  ^  Z  is  so  low. 

We  note  that  this  definition  of  an  interactive  proof  system  is  stated  in 
terms  of  a  distribution  over  coin  flips.  This  definition  can  be  translated 
immediately  into  a  statement  in  terms  of  a  distribution  over  runs  using  the 
framework  given  in  Chapter  4  as  follows.  Notice  that  once  we  fix  the  initial 
state  (meaning  that  we  fix  P,  V,  s,  t,  and  x),  we'can  view  the  runs  with  this 
initial  state  as  a  single  computation  tree  as  defined  in  Chapter  4.  Recall  that 
P  X  V  is  the  system  consisting  of  zdl  possible  runs  of  (P,  V),  and  that  V  xV 
is  the  system  consisting  of  the  union  of  the  systems  P*  x  V  for  all  Turing 
machines  P*.  In  terms  of  the  assignment  the  soundness  condition 
says  that  the  formula  Pr[Oaccepi]  >  1  —  |x|“*  is  true  at  all  initial  points  of 
PxV  satisfying  x  €  Z,  and  the  completeness  condition  says  that  the  formula 
Pr[Oacccp<]  <  |x|“*  is  true  at  all  initial  points  ofVxV  satisfying  x  0  Z.  In 
this  chapter,  we  are  careful  to  write  pr[ip\  >  a  when  the  probability  space  is 
a  set  of  coin  flips,  and  to  write  Pr[(p]  >  a  when  the  probability  space  is  a  set 
of  runs  (and,  in  particular,  when  Pr[<p]  >  a  is  to  be- interpreted  as  a  formula 
in  our  logic  of  knowledge  and  probability). 

One  of  the  best  known  examples  of  an  interactive  proof  system  is  the 
proof  system  for  graph  isomorphism  from  [GMW86].  Two  graphs  Gq  and  G\ 
are  said  to  be  isomorphic  if  there  is  a  bijection  h  between  the  nodes  of  Go  and 
Gi  with  the  property  that  (u,  v)  is  an  edge  of  Go  iff  (h(u),  h(v))  is  an  edge  of 
Gi.  The  graph  isomorphism  problem  is  formula.ted  in  terms  of  membership 
in  the  language  of  ordered  pairs  (Go,Gi),  where  Go  and  Gi  are  isomorphic 
graphs.  One  simple  interactive  proof  system  for  graph  isomorphism  is  for  the 
prover,  on  input  (Go,  Gi),  to  send  the  verifier  an  isomorphism  h  between  Go 
and  Gi,  and  have  the  verifier  check  that  h  is  indeed  an  isomorphism;  but  this 
clearly  gives  the  verifier  more  information  than  the  simple  fact  that  the  two 
graphs  are  isomorphic;  it  actually  gives  the  verifier  an  isomorphism!  The 
protocol  of  [GMW86]  is  not  this  explicit.  Suppose  h  is  an  isomorphism  from 
Go  to  Gi,  which  can  either  be  computed  by  an  infinitely  powerful  prover  or 
supplied  as  auxiliary  input  to  the  prover  as  an  initial  value  on  its  work  tape. 
The  protocol  consists  of  n  =  |(Go,  Gi)|  rounds,  where  each  round  consists  of 
the  following  sequence  of  steps: 

1.  The  prover 

(a)  chooses  a  random  permutation  tt  of  the  vertices  of  Go  —  {Vo,  Eo), 
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(b)  computes  H  =  {Vo,E),  where  H  is  the  graph  isomorphic  to  Go 
defined  by  (7r('u),  7r(t;))  €  iff  (u,  v)  €  Eq,  and 

(c)  sends  H  to  the  verifier. 

2.  The  verifier  chooses  a  bit  a  at  random  and  sends  a  to  the  prover. 

3.  The  prover  sends  the  verifier  an  isomorphism  from  Ga  to  H:  ii<x  =  0, 
the  prover  sends  h;  if  a  =  1,  the  prover  sends  Th~^. 

4.  The  verifier  checks  that  the  mapping  received  from  the  prover  is  indeed 
an  isomorphism  from  Ga  to  H. 


The  verifier  accepts  at  the  end  of  n  rounds  iff  all  n  iterations  of  the  protocol 
are  successfully  completed. 

It  is  not  hard  to  show  that  this  interactive  protocol  is  indeed  an  inter¬ 
active  proof  system  for  graph  isomorphism.  If  the  two  graphs  Go  and  Gi 
are  isomorphic,  then  the  prover  will  always  be  able  to  send  the  verifier  an 
isomorphism  ir  or  Th~^  from  Go  or  Gi  to  H,  depending  on  which  is  requested 
by  the  verifier,  and  hence  will  always  cause  the  verifier  to  accept.  Thus,  the 
completeness  condition  is  satisfied.  If  the  two  graphs  Go  and  Gi  are  not  iso¬ 
morphic,  then  the  graph  H  sent  to  the  verifier  by  the  prover  (by  any  prover, 
in  fact)  cannot  be  isomorphic  to  Both  Go  and  Gi,  and  the  fact  that  the  veri¬ 
fier  chooses  the  bit  a  at  random  means  that  vdth  probability  1/2  the  verifier 
will  ask  the  prover  for  an  isomorphism  between  H  and  the  graph  to  which 
H  is  not  isomorphic,  which  the  prover  will  certainly  be  unable  to  do.  The 
probability,  therefore,  that  a  prover  (any  prover)  will  be  able  to  supply  the 
requested  isomorphism  on  each  iteration,  and  hence  cause  the  verifier  to  ac¬ 
cept  incorrectly,  is  at  most  1/2”.  Thus,  the  soundness  condition  is  satisfied, 
and  the  protocol  is  an  interactive  proof  system  for  graph  isomorphism. 

This  discussion  shows  that  the  verifier  can  use  the  protocol  above  to 
determine  (with  the  prover’s  help)  whether  two  graphs  are  isomorphic.  It 
is  not  initially  clear,  however,  that  the  verifier  cannot  use  this  protocol  in 
some  “unauthorized”  way  to  determine  whether  some  other  fact  is  true.  For 
example,  suppose  that  the  verifier  chooses  the  a’s  in  some  way  depending  on 
the  graphs  H  sent  by  the  prover,  rather  than  choosing  the  a’s  at  random  as 


required  by  the  protocol.  Is  it  possible  for  the  verifier  to  use  the  protocol  in 


this  way,  and  then  compute  whether  a  certain  value  x  is  a  quadratic  residue 


modulo  n,  where  n  is  the  number  of  vertices  in  the  two  graphs;  or  then 
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determine  whether  one  of  Gq  or  Gt  is  isomorphic  to  a  third  graph  Gl  The 
intuition  behind  zero  knowledge  is  that  such  use  of  the  protocol  should  be 
impossible. 


5.2.3  Zero  knowledge  proof  systems 

This  intuition  that  the  verifier  cannot  use  the  graph  isomorphism  protocol 
to  determine  the  truth  of  facts  other  than  whether  the  two  input  graphs  Gq 
and  Gl  are  isomorphic  is  captured  as  follows.  Loosely  speaking,  we  say  that 
an  interactive  proof  system  (P,  V)  is  zero  knowledge  if,  whenever  x  £  L,  the 
verifier  is  able  to  generate  on  its  own  the  conversations  it  could  have  had 
with  the  prover  during  an  interactive  proof  olx  £  L.  Consequently,  the  fact 
X  G  jL  is  the  only  knowledge  gained  by  the  verifier  as  a  result  of  the  proof 
of  X  G  L:  if  the  verifier  is  able  to  determine  the  truth  of  some  other  fact 
after  conversations  with  the  prover,  then  the  verifier  is  able  to  determine  the 
truth  of  the  fact  on  its  own  by  generating  these  conversations  on  its  own. 
In  particular,  if  it  is  possible  for  the  verifier  to  use  the  graph  isomorphism 
protocol  to  determine  whether  one  of  Gq  or  Gi  is  isomorphic  to  a  third  graph 
G,  then  it  is  possible  for  the  verifier  to  determine  the  truth  of  this  fact  on  its 
own  without  even  talking  to  the  prover. 

The  intuition  that  the  verifier  can  generate  these  conversations  on  its  own 
is  captured  as  follows.  Consider  runs  of  the  protocol  (P,  V)  with  input  x  and 
work  tapes  a  and  t.  From  the  verifier’s  point  of  view,  a  conversation  with  the 
prover  (that  is,  a  run)  is  uniquely  determined  by  the  verifier’s  local  history 
of  the  run,  where  the  verifier’s  local  history  is  the  sequence  of  local  states 
the  verifier  assumes  during  the  run.  Intuitively,  when  we  say  that  the  verifier 
cam  generate  on  its  own  the  conversations  it  has  with  the  prover,  we  mean 
there  is  a  Turing  machine  M  that  on  input  x  and  t  generates  local  histories 
with  the  same  distribution  the  verifier  would  see  these  local  histories  during 
runs  of  (P(j),  V(t))(x).® 

^This  intuition  is  formulated  slightly  differently  in  [GMR89].  They  liote  th&t,  given  x 
and  t,  the  verifier’s  view  of  a  conversation  with  the  prover  is  uniquely  determined  by  the 
finite  sequence  p  of  random  bits  it  uses  during  the  conversation  together  with  the  finite 
sequence  a(i,...,an  of  messages  it  receives  from  the  prover;  everything  else  the  verifier 
sees  during  the  conversation  (e.g.,  the  messages  it  sends)  can  be  efficiently  computed  given 
this  information.  They  call  the  tuple  (p,  cci, . . . ,  o„)  the  verifier’s  view  of  the  run,  and  say 
that  the  verifier  can  generate  on  its  own  the  conversations  it  has  with  the  prover  if  there 
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This  is  made  precise  as  follows  (cf.  [GMR89,  GMW86,  Ore87]).  Suppose 
we  have  some  domain  Dom  whose  elements  are  of  the  form  (x,y),  where  z  is 
a  string  and  y  is  a.  vector  of  strings.  Suppose  for  each  (z,  y)  G  Dom  we  have 
two  random  variables  Ux,jj  and  14,^  together  with  their  associated  probability 
distributions.  The  families  {17*, j  :  {x,y)  €  Dom}  and  {14,5  •  (®>y)  ^  Dom} 
are  said  to  be  perfectly  indistinguishable  if  the  distributions  of  Ux,g  and  Vx,g 
are  identical  for  all  (z,y)  6  Dom. 

Given  an  interactive  protocol  (P,  V*),  we  denote  by  (P(5),  Vl*(t))(z)  the 
random  variable  assuming  as  values  the  verifier’s  local  histories  of  runs  of 
(P(s),  V’*(t))(z),  where  the  distribution  is  determined  by  the  coins  fiipped 
by  the  prover  and  the  verifier.  More  precisely,  we  define  {P{s),Vf{t)){x)  to 
be  the  function  mapping  a  sequence  p  of  coin  flips  to  the  the  verifier’s  local 
history  of  that  run  of  (P(s),  V’*(t))(z)  in  which  p  is  the  sequence  of  coins 
fiipped  by  the  prover  and  the  verifier.  Given  a  probabilistic  Turing  machine 
M,  we  denote  by  M(t,  z)  the  random  variable  assuming  as  values  the  outputs 
generated  by  M  on  inputs  t  and  z,  where  the  distribution  is  determined  by 
the  coins  flipped  by  M.  An  interactive  proof  system  (P,  V)  for  L  is  said  to 
be  perfect  zero  knowledge  (cf.  [GMR89])  if  for  every  verifier  V*  there  is  a 
probabilistic  Turing  machine  My  such  that 

1.  Mv‘{t,x)  runs  in  expected  time  polynomial  in  |z|,  and 

2.  the  families 

{(P(s),  y*(t))(z) :  (x,s,t)  €  Dom}  and  {Mv(t,x) :  (x,s,t)  €  Dom} 

are  perfectly  indistinguishable,  where  (x,s,t)  €  Dom  iff  z  G  L,  s  is  a 
possible  input  f'-  P,  and  <  is  a  possible  input  for  V*. 

This  definition  says  an  interactive  proof  system  is  perfect  zero  knowledge 
if  the  verifier  V*  cein  generate  local  histories  on  its  own,  using  My ,  with 
precisely  the  same  distribution  it  woiild  see  these  local  histories  during  runs 
of  (P,  V*)  on  input  z  with  s  and  t. 

is  a  Turing  machine  M  that  on  input  x  and  t  generates  views  with  the  same  distribution 
the  verifier  would  see  these  views  during  runs  of  (P,  v')  on  input  x  with  a  and  t.  The 
two  formulations  are  equivalent,  of  course,  since  the  local  history  is  efficiently  computable 
from  the  view,  and  vice  versa. 
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It  is  not  too  hard  to  show,  for  example,  that  the  interactive  protocol  for 
graph  isomorphism  given  above  is  actually  zero  knowledge  [GMW86].  To 
see  this,  fix  a  verifier  protocol  V*,  and  let  us  construct  a  simulating  Turing 
machine  Mv»  that  generates  local  histories  (one  local  state  at  a  time)  with 
the  same  distribution  the  verifier  would  observe  during  runs  of  (P,  F*).  The 
Turing  machine  My  is  defined  as  follows:  for  each  round  x  =  1, . . .  ,n, 


1.  My  first  tries  to  guess  the  bit  a,-  that  V*  will  choose:  My  chooses  a 
random  bit  13. 

2.  My  then  chooses  a  random  permutation  tt^  of  the  nodes  of  and 
writes  on  F*’s  input  communication  tape  the  isomorphic  copy  Hi  of 
Gff  defined  by  (7ri(u),  7r,(u))  is  an  edge  of  Hi  iff  (u,u)  is  an  edge  of  Gp. 

3.  My  simulates  the  Turing  machine  V*  until  V*  writes  a  bit  a,-  on  its 
output  communication  tape. 

4.  My  reads  a,-. 

(a)  If  a,-  =  then  My  writes  7r<  on  V^’s  input  communication  tape 
and  outputs  the  verifier’s  local  state.  More  precisely,  Mv»  outputs 
three  local  local  states:  the  state  after  the  prover  sends  Hi,  the 
state  after  the  verifier  sends  Oi,  and  the  state  jdter  the  prover 
sends  7r{. 

(b)  li  Oi  ^  (3,  then  My  rewinds  V*  to  its  configuration  at  the  be¬ 
ginning  of  this  iteration  (this  includes  erasing  Hi  from  F*’s  input 
communication  tape)  and  repeats  steps  1-4. 


The  first  key  observation  here  is  that,  when  the  two  graphs  Go  and  Gi 
are  isomorphic,  a  random  permutation  of  Go  is  a  random  permutation  of  Gi . 
It  follows  that  the  probability  of  generating  H  by  choosing  Gp  at  random 
and  choosing  a  permutation  of  Gp  at  random  is  equal  to  the  probability 
of  generating  H  by  choosing  a  permutation  of  Go  at  random.  The  second 
key  observation  is  that,  although  My  may  have  to  try  a  number  of  times 
before  it  can  finish  the  xth  iteration  and  generate  the  graph  Hi,  the  tries 
are  independent.  It  follows  bhav  ^iv^ucbbiiiby  a  grapn  is 

generated  on  the  Ath  try  for  the  ith  iteration,  given  that  the  first  A  —  1 
tries  have  failed,  is  the  same  for  all  A;  and  hence  that  the  probability  H  is 
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generated  by  My  on  the  ith  iteration  is  equal  to  the  probability  the  prover 
outputs  H  in  the  ith  round.  Since  the  remtunder  of  the  ith  round  is  simply 
a  simulation  of  {P,  V*),  it  follows  that  the  distributions  generated  by  (P,  F*) 
and  Mv«  are  identical.  (We  leave  it  to  the  reader  to  verify  that  the  expected 
number  of  tries  required  for  Mv*  to  complete  the  ith  iteration  is  2,  and  hence 
that  Mv»  runs  in  expected  polynomial  time;  see  [GMW86].) 

The  requirement  that  My  generates  local  histories  with  precisely  the 
same  distribution  with  which  they  occur  during  runs  of  (P,  y*),  however,  is 
a  very  strong  requirement.  Since  we  are  interested  in  what  a  polynomial- time 
verifier  can  learn  as  a  result  of  a  conversation  with  the  prover,  it  should  be 
sufficient  if  (cf.  [GM84])  no  polynomial- time  test  (meaning  no  test  that  can 
be  used  by  a  polynomial-time  verifier)  can  detect  any  difference  between  the 
distributions  generated  by  My*  and  (P,  F*). 

This  intuition  is  formalized  as  follows  (cf.  [GMR89,  GMW86,  TW87, 
Ore87]).  Two  families  {f4,y  :  (®,y)  €  Dom}  and  {14, j  :  (®,  jl)  €  Dom}  of 
random  variables  are  said  to  be  polynomially  indistinguishable  if  for  every 
probabilistic,  polynomial-time  algorithm  M  and  every  constant  k>l  there 
exists  a  constant  IV^.k  >  1  such  that  for  all  x  with  |®|  >  NM,k  and  all  y  with 
{x,y)  €  Dom  we  have 

\pr[M  accepts  —  pr[M  accepts  14,j]|  <  |®|~* . 


It  is  important  to  notice  that  the  probability  is  being  taken  over  both  the 
coin  flips  of  M  and  the  distributions  of  Ux,fi  and  14,5.  It  is  also  important 
to  notice  that  the  quantification  over  x  (e.g.,  the  common  input)  is  not  the 
same  as  the  quantification  over  j/  (e.g.,  the  auxiliary  inputs  to  the  prover  and 
verifier). 

The  definition  of  what  it  means  for  an  interrictive  proof  system  (P,  F) 
for  L  to  be  {polynomially)  zero  knowledge  is  obtrined  by  replacing  perfect 
indistinguishability  with  polynomial  indistinguishability  in  the  definition  of 
perfect  zero  knowledge.  This  definition  of  zero  knowledge  is  actually  the  def¬ 
inition  given  in  [GMW86]  (and  also  in  [Ore87]).  This  is  the  definition  of  zero 
knowledge  we  use  in  the  remainder  of  this  chapter.  Other  notions  of  zero 
knowledge  based  on  other  notions  of  indistinguishability  {statistical  indistin- 


guishahiliiy  and  computational  indisiinguiskahiliiy)  are  defined  in  [GMR89]. 


Since  these  notions  of  indistinguishability  imply  polynomial  indistinguisha- 


bility,  and  since  our  results  are  proven  in  the  context  of  polynomial  indistin- 
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guishability,  our  results  (Theorems  5.5,  5.6,  and  5.7)  hold  in  the  context  of 
these  other  notions  of  indistinguishability  as  well. 


5.3  Knowledge 

With  these  examples  in  mind,  we  now  define  notions  of  knowledge  for  use  in 
the  analysis  of  cryptographic  protocols.  Among  other  things,  these  examples 
have  two  distinguishing  features. 

First,  they  are  probabilistic.  Correctness  conditions  (such  as  the  sound¬ 
ness  and  completeness  conditions  for  an  interactive  proof)  guarantee  that 
given  properties  hold  with  high  probability,  but  not  with  certainty.  Thus, 
while  agents  are  justified  in  having  a  high  degree  of  confidence  that  these 
properties  hold,  agents  do  not  know  they  hold.  Clearly,  some  definition  of 
knowledge  incorporating  probability  such  as  probabilistic  knowledge  defined 
in  Chapter  4  will  be  useful  here. 

Second,  and  most  important,  the  security  of  a  zero  knowledge  proto¬ 
col  depends  on  the  fact  that  the  verifier’s  computational  power  is  restricted 
to  polynomial  time,  since  the  protocol’s  security  depends  on  the  fact  that  a 
polynomial-time  agent  cannot  distinguish  distributions  on  local  histories  gen¬ 
erated  by  Mv  and  (P,  F*).  In  general,  a  common  feature  of  cryptographic 
protocols  is  the  use  of  computational  intractability  to  keep  information  se¬ 
cret.  While  we  are  willing  to  accept  the  fact  that  an  infinitely  powerful 
verifier  might  be  able  to  make  unexpected  use  of  a  zero  knowledge  proof, 
we  are  not  willing  to  accept  the  possibility  a  polynomial-time  agent  could 
increase  its  knowledge  in  the  same  way.  To  study  such  protocols  in  terms 
of  knowledge,  therefore,  requires  a  definition  of  knowledge  that  accounts  for 
bounds  on  an  agent’s  computational  power. 

Recall  that,  while  the  need  for  such  defitutions  of  knowledge  accounting 
for  an  agent’s  computational  powers  is  acutely  apparent  in  the  context  of 
cryptography,  we  have  already  seen  the  need  for  such  definitions  in  Chapter 
3,  even  in  the  absence  of  cryptography.  In  the  sending  or  receiving  omis¬ 
sions  models,  the  tests  for  common  knowledge  used  by  an  agent  to  determine 
whether  a  fact  is  common  knowledge  are  easily-computable  functions  of  the 
agent’s  local  state.  The  same  is  typically  true  for  most  work  in  the  literature 
using  knowledge  to  analyze  distributed  computation.  For  this  reason,  using 
information-theoretic  definitions  of  knowledge  (definitions  that  do  not  take 
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into  account  agents’  limited  computational  power)  does  not  lead  to  trou¬ 
ble.  In  the  generalized  omissions  model,  however,  the  same  tests  for  common 
knowledge  are  no  longer  easily  computable.  We  therefore  concluded  in  Chap¬ 
ter  3  that  information-theoretic  definitions  do  no  capture  all  relevant  aispects 
of  simultaneous  coordination  in  this  model.  A  major  challenge  presented 
here,  therefore,  is  to  define  knowledge  in  a  way  that  accounts  for  bounds  on 
agents’  computational  powers. 

5.3.1  Knowledge  and  Probability 

As  we  saw  in  Chapter  4,  there  are  a  number  of  meaningful  definitions  of 
probabilistic  knowledge  in  the  context  of  synchronous  systems,  systems  such 
as  the  ones  we  consider  here.  Since  the  prover  p  seems  to  be  the  natural 
choice  for  the  verifier’s  “opponent”  in  an  interactive  proof  system,  arguments 
in  Chapter  4  imply  that  the  assignment  that  conditions  on  the  joint 
knowledge  of  both  the  prover  and  the  verifier,  is  the  “right”  assignment  for 
the  verifier  to  use.  In  this  chapter,  however,  we  will  use  the  assignment 
the  assignment  that  assigns  to  an  agent  and  a  point  c  the  probability  space 
of  all  points  with  c’s  global  state. 

The  choice  of  this  assignment  is  due  to  the  fact  that  we  will  be  interested 
in  the  truth  of  formulas  of  the  form  Jffv?  at  time  0  points.  In  Chapter  4, 
we  noted  that  all  of  the  assignments  V^'*,  P*’,  7^'““,  and  even  are 

equivalent  at  time  0;  that  is,  the  probability  spaces  they  assign  to  a  given 
agent  at  a  given  point  are  identical.  This  means  that  a  formula  K^ip  is  true 
at  time  0  with  respect  to  one  of  these  eissignments  iff  it  is  true  with  respect  to 
all  of  them.  Consequently,  from  a  semantic  point  of  view,  the  exact  choice  of 
the  assignment  is  irrelevant.  From  a  computational  point  of  view,  however, 
has  several  advantages.  First,  the  probability  spaces  assigned  by  V'** 
are  independent  of  the  agent  (they  depend  only  on  the  current  global  state). 
Second,  the  probability  space  assigned  to  a  point  is  uniquely  determined  by 
the  distribution  on  the  runs  extending  this  point.  Since  interactive  and  zero 
knowledge  proof  systems  are  defined  in  terms  a  distribution  on  runs  (that  is, 
the  distribution  on  the  runs  extending  initial  points),  the  definition  of 
seems  most  closely  related  to  the  definition  of  such  proof  systems.  Finally, 
the  simple  nature  of  definition  will  simplify  our  analysis  slightly. 

With  these  observations  in  mind  (that  we  can  prove  our  results  in  terms 
of  and  know  they  will  know  in  terms  of  any  other  assignment  of  interest. 
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and  that  V^'*  simplifies  our  analysis),  we  fix  as  the  probability  assign¬ 
ment  used  in  our  analysis.  Having  fixed  the  assignment  V***,  we  can  safely 
omit  V*'*  from  the  left  side  of  the  tumstyle  ‘1=*  in  formulas  involving  prob¬ 
abilistic  knowledge  without  introducing  any  ambiguity.  Furthermore,  since 
the  operators  Pr,-  are  identical  for  all  agents  p,-,  we  can  omit  the  subscript 
i.  We  reiterate  the  point  made  in  Section  5.2.1  concerning  the  formulas 
‘pr[^]  >  a’  and  ‘Prfys]  >  ot’:  we  write  ‘pr{^]  >  a’  when  the  underlying 
probability  distribution  is  sequences  of  coin  flips,  and  ‘Pr[^]  >  a’  when  this 
distribution  is  points  or  runs  (and,  in  particular,  when  ‘Pr[v>]  >  at’  is  meant 
to  be  interpreted  as  a  formula  in  our  language  of  knowledge  and  probability). 


5.3.2  Knowledge  and  Computation 

We  now  turn  our  attention  to  definitions  of  knowledge  that  account  for 
an  agent’s  limited  computational  power.  Intuitively,  we  want  to  restrict  an 
agent’s  knowledge  to  what  it  can  compute.  As  Moses  discusses  in  [Mos88], 
however,  there  is  more  than  one  way  to  do  this.  The  motivation  for  our 
definition  is  that  we  want  to  use  our  definition  of  knowledge  to  construct  and 
analyze  protocols.  The  tests  an  agent  uses  to  determine  what  it  knows  (and 
hence  what  actions  to  perform)  in  the  course  of  a  protocol  are  allowed  to  be 
virtually  any  function  of  the  agent’s  local  state.  The  only  thing  that  restricts 
the  tests  an  agent  can  perform  is  the  agent’s  limited  computational  power. 
This  is  the  fundamental  intuition  underlying  our  definition  of  practical  knowl¬ 
edge,  a  definition  of  knowledge  incorporating  both  probability  and  bounds 
on  an  agent’s  computational  resources.  The  exact  definition  of  practical 
knowledge  is  best  motivated  by  way  a  sequence  of  intermediate  definitions. 


Resource-bounded  knowledge 


The  definition  of  resource-hounded  knowledge  given  in  [Mos88]  succinctly 
captures  this  intuition  that  it  is  the  bounds  on  an  agent’s  computational 
resources  that  restrict  the  tests  the  agent  can  perform,  and  hence  what  the 
agent  can  know.  Loosely  speaking,  this  definition  says  that  a  polynomial- 
time  agent  knows  a  fact  only  if  there  is  a  polynomial-time  test  the  agent  can 
use  to  determine  that  it  knows  this  fact.  This  intuition  can  be  generalized  to 


any  complexity  class  (see  [Mo888]),  and  not  just  polynomial-time.  However, 
since  cryptography  is  typically  concerned  with  what  an  agent  can  learn  using 
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probabilistic  tests  running  in  time  polynomial  in  some  parameter  determined 
by  its  local  state  (a  parameter  suck  as  |x|»  the  length  of  the  common  input), 
the  class  BPP  seems  to  be  the  complexity  class  of  most  relevance  to  cryp¬ 
tography.  We  therefore  restrict  our  attention  to  knowledge  with  respect  to 
the  class  BPP. 

The  notion  of  a  BPP  test  an  agent  can  use  to  determine  whether  it  knows 
a  fact  <p  can  be  made  precise  as  follows.  Given  a  system  R,  a  probabilistic 
algorithm  M  is  said  to  be  a  BPP  test  for  Kq<p  in  R  if,  for  all  points  (r,  m) 
of  il, 

1.  M’s  input  is  q^s  local  state  7*,(m), 

2.  M  runs  in  time  polynomial  in  |®|,  where  x  is  the  common  input  recorded 
in  r,(m), 

3.  M  accepts  with  probability  at  least  2/3  if  (r,m)  \=  Kq<p,  and  rejects 
with  probability  at  least  2/3  if  (r,m)  ^  Kq(p.^ 

This  definition  essentially  says  that  the  language  of  local  states  r,(m)  sat- 
isfjTing  (r,m)  \=  Kq(p  is  in  BPP,  the  only  difference  being  that  the  BPP  test 
is  required  to  run  in  time  polynomial  in  |x|  and  not  |r,(m)|.  We  choose  |x| 
instead  of  |r,(7n)|  because  it  seems  to  be  the  preferred  parameter  in  the  con- 
text  of  interactive  proofs.  Interactive  protocols  (P,  V")  and  simulating  Turing 
machines  My ,  for  example,  are  both  required  to  run  in  time  polynomial  in 
|z|,  and  not,  say,  in  |x|,  |s|,  and  |t|.  In  all  interactive  proofs  we  are  aware  of, 
however,  the  size  |rv(m)|  of  the  verifier’s  local  state  is  polynomial  in  |x|. 

We  can  now  make  precise  the  intuition  that  an  agent  knows  a  fact  only  if 
it  can  compute  that  it  knows  this  fact.  Given  a  system  R,  an  agent  q  is  said 
to  BPP-know  at  a  point  c  of  ij,  denoted  by  c  1=  if 

1.  (r,m)  1=  Kqtp,  and 

2.  there  is  a  BPP  test  for  Kq(p  in  R. 

^The  probability,  of  course,  is  being  taken  over  M’s  coin  flips.  We  note  that  there 
is  nothing  special  about  the  value  2/3.  We  can  use  any  value  bounded  above  and  away 
xxom  xyT  XU  AoCii,  XX  IS  easy  to  replace  any  such  value  with  1  —  by  using  the  standard 
trick  of  running  the  original  test  M  many  times  to  estimate  the  probability  with  which  M 
accepts  or  rejects. 
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Thus,  a  processor  BPP-knows  <p  if  it  knows  y?  and  there  is  a  BPP  test  it  can 
use  to  compute  that  it  knows  tp. 

To  get  a  better  feeling  for  how  this  definition  behaves,  consider  a  system 
in  which  an  agent’s  local  state  includes  two  integer-valued  variables  m  and 
n  (the  value  of  these  variables  might  be  determined  by  the  contents  of  the 
input  tape,  for  example),  and  suppose  that  for  every  pair  of  integers  and 
in  there  is  a  run  of  the  system  in  which  the  values  of  m  and  n  ate  and  *») 
respectively.  Consider  a  point  c  at  which  m  =  (n  ~  1)^  (mod  n).  Since  it  is 
very  easy  for  an  agent  to  check  that  •m  =  {n  —  1)®  (mod  u),  it  is  clear  that 
the  agent  BPP-knows  the  fact  -0  that  ‘m  =  (n— 1)®  (mod  «)’  at  the  point  c. 
Notice  that  if  m  =  (n  —  1)’  (mod  n),  then  m  is  a  quadratic  residue  modulo 
n  (that  is,  a  square  modulo  n).  Since  the  agent  BPP-knows  that  m  =  {n  —  lY 
(mod  n)  at  c,  it  is  natural  to  assume  that  the  agent  must  also  BPP-know  the 
fact  (f  that 'm  is  a  quadratic  residue  modulo  n’  at  c.  But  recall  that  in  order 
for  the  agent  to  BPP-know  the  fact  ‘m  is  a  quadratic  residue  modulo  n’  at  a 
point,  there  must  be  a  BPP  test  that  determines  whether  m  is  a  quadratic 
residue  for  arbitrary  m  and  n;  and  assuming  quadratic  residuosity  is  hard, 
this  is  impossible.  It  follows  that  the  agent  does  noi  BPP-know  the  fact  ‘m  is 
a  quadratic  residue  modulo  at  c  after  all.  Notice  that  the  agent  BPP-knows 
the  fact  vx  X  {n-~  1)’  (mod  n)  at  the  point  c,  and  since  the  implication 
“if  TO  =  (n  —  1)^  (mod  n),  then  m  is  a  quadratic  residue  modulo  n”  is  a 
tautology,  the  agent  clearly  BPP-knows  this  fact  as  well  (the  simple  test  that 
always  accepts  is  a  BPP-test  for  this  fact).  Consequently,  this  example  shows 
that,  unlike  the  information  theoretic  definition  of  knowledge,  it  is  possible 
for  an  agent  to  BPP-know  both  facts  ij)  and  if)  D  tp  without  BPP-knowing 
the  fact  tp.  The  agent  does  know  if)  ^<Pi  but  it  need  not  know  (p  itself.  In 
this  sense,  an  agent  no  longer  knows  all  consequences  of  its  knowledge  (that 
is,  everything  that  logically  follows  from  the  information  recorded  in  its  local 
state).  This  is  a  result  of  the  fact  that  this  definition  restricts  an  agent’s 
knowledge  to  what  it  can  compute.  The  reader  is  referred  to  [Mos88]  for  an 
interesting  discussion  of  this  and  other  properties  of  this  definition. 


A  notion  cf  learning 


The  definition  of  BP?  knowledge  restricts  an  agent’s  knowledge  to  what  it 


can  compute  by  requiring  the  existence  of  a  test  the  agent  can  use  at  all 
points  of  a  system  to  compute  whether  it  knows  a  given  fact.  In  this  sense. 
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BPP  knowledge  captures  what  an  agent  can  compute  on  its  own.  Some¬ 
times,  however,  it  is  possible  for  an  agent  to  ob'^ain  some  extra  information 
(possibly  from  another  agent  in  the  system),  and  with  this  extra  information 
the  agent  is  able  to  learn  things  it  couldn’t  have  computed  on  its  own.  This 
informal  notion  of  “learning”  is  of  great  importance  to  cryptography  (and, 
in  particular,  to  zero  knowledge  proof  systems).  Unfortunately,  it  does  not 
seem  possible  to  capture  this  notion  of  learning  directly  in  terms  of  resource- 
bounded  knowledge. 

To  understand  this  situation  more  clearly,  consider  again  the  system  in 
which  an  agent’s  local  state  contains  the  two  integer- valued  variables  m  and 
n,  and  consider  again  the  fact  ^  that  ‘m  is  a  quadratic  residue  modulo  n.’ 
As  we  have  seen,  it  is  impossible  for  an  agent  to  BPP-know  (p  since  there 
is  no  BPP  test  to  determine  whether  m  is  a  quadratic  residue  modulo  n  for 
arbitrary  m  and  n.  There  are,  however,  situations  in  which  it  does  seem 
to  make  sense  to  say  that  an  agent  knows  (p.  One  example  is  the  special 
case  in  which  m  =  (n  —  1)^  (mod  n).  A  more  interesting  situation  is  one 
in  which  an  agent  somehow  obtains  the  factorization  of  n,  and  hence  the 
agent  is  easily  able  to  compute  whether  <p  holds.  There  are  a  number  of  ways 
in  which  the  agent  might  obtain  this  factorization.  The  agent  might  find 
the  fv  ‘orization  in  one  of  the  messages  it  has  received  from  other  agents  in 
the  system  (e.g.,  from  the  prover  in  an  interactive  proof  system);  or,  more 
generedly,  it  might  be  able  to  deduce  the  factorization  from  the  contents  of 
these  messages  rather  than  finding  the  factorization  explicitly  contained  in 
one  of  the  messages.  In  either  ceise  it  seems  reasonable  to  say  that,  although 
the  agent  cannot  always  determine  whether  (p  holds,  in  these  cases  it  clearly 
can,  and  hence  can  be  said  to  know  (p.  More  generally,  for  any  difficult  to 
compute  fact,  once  an  agent  has  seen  a  proof  of  the  fact,  it  no  longer  seems  to 
make  sense  to  say  the  agent  does  not  know  the  fact  (although  it  certzdnly  did 
not  know  the  fact  before  seeing  the  proof).  Since  an  agent  caimot  BPP-know 
a  fact  like  (p,  however,  this  notion  of  learning  cannot  be  captured  directly  in 
terms  of  resource-bounded  knowledge. 


Knowledge  given  facts 


A 

?TC  bXlCfcW  d 


How,  then,  can  one  capture  tliis  notion  of  learning? 
number  of  ways  of  doing  so,  and  at  the  end  of  this  section  we  discuss  several 
alternatives  to  the  method  we  propose.  Our  approach,  however,  is  a  very 
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direct  one.  Recall  the  reason  we  felt  resource-boiinded  knowledge  could  not 
capture  this  intuition:  some  agent  may  fortuitously  obtain  some  information 
tf),  such  as  the  factorization  of  an  integer,  that  is  enough  for  the  agent  to  be 
able  to  determine  that  it  knows  a  fact  <p.  Our  idea  is  to  define  a  notion  of 
BPP  knowledge  of  ip  relative  to  a  fact  if).  Roughly  speaking,  this  means  we 
have  a  BPP  test  M  that  correctly  determines  whether  q  knows  tp  when  ij} 
is  true,  but  is  not  necessarily  correct  when  rj)  is  false.  However,  we  do  not 
want  the  results  of  this  test  to  be  completely  arbitrary  when  tli  is  false.  In 
particular,  we  Weint  to  be  able  to  trust  this  test  whenever  it  says  that  q  knows 
<P- 

One  way  to  capture  this  intuition  is  to  make  two  requirements  of  the  test 
M:  the  first  is  that  M  be  a  sound  test  for  KqP,  meaning  that  Kq<p  holds  at 
a  point  if  M  accepts  with  high  probability  at  that  point;  the  second  is  that 
M  be  a  complete  test  for  Kqtp  at  all  points  satisfying  r/},  meaning  that  M 
will  accept  with  high  probability  at  such  a  point  if  Kqip  holds  at  that  point. 
These  properties  together  guarantee  that  M  is  an  accurate  test  for  Kqtp  at 
points  satisfying  V*!  and  soundness  guarantees  that,  regardless  of  the  truth 
of  we  can  trust  M  when  it  says  Kq<p  is  true. 

To  make  this  precise,  we  proceed  as  follows.  We  say  that  a  test  M  is  a 
sound  test  for  a  fact  i?  at  a  point  c  if  c  [=  implies  that  M  rejects  at  c 
with  probability  at  least  2/3.  We  write  c  |=  sound{M,'d)  if  M  is  a  sound 
test  for  “d  at  c.  Similarly,  we  say  that  M  is  a  complete  test  for  ■5?  at  c  if 
c  1=  1?  implies  that  M  accepts  at  c  with  probability  at  least  2/3.  We  write 
c  1=  complete{M,'d)  if  Af  is  a  complete  test  for  at  c. 

We  capture  the  intuition  that  M  is  a  good  test  for  K^fp  when  V'  holds  as 
follows.  Given  a  system  R,  a  probabilistic  algorithm  M  is  said  to  be  a  BPP 
test  for  Kqip  given  -0  in  R.  if,  for  ?J1  points  (r,  m)  of  R, 

1.  M’s  input  is  q’s  local  state  r,(m), 

2.  M  runs  in  time.polynomial  in  |a;|,  where  x  is  the  common  input  recorded 
in  r,(m), 

3.  M  satisfies  the  following  properties: 

(a)  M  is  a  sound  test  for  Kg(p  on  R:  R  |=  sound(M,  K„(p). 

(b)  M  is  a  complete  test  for  Kqip  pven  R  [=  V*  D  complete{M,  K^(p). 
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We  remaxk  that  such  a  test  is  very  similar  to  the  solution  of  a  promise 
problem  as  defined  in  [ESY84].  A  promise  problem  {A,B)  is  a  partial  de¬ 
cision  problem  determined  by  two  predicates,  a  promise  A  and  a  property 
B.  A  Turing  machine  N  solves  (A,  B)  if,  for  every  x  satisfying  the  promise 
A(x),  the  machine  N  halts  on  input  x  and  accepts  on  input  x  iff  x  satisfies 
the  property  B{x).  So  JV  is  a  partial  decision  procedure  for  the  language 
L  =  {x  :  B(x)}:  it  correctly  determines  whether  x  6  L  when  the  promise 
A(x)  is  satisfied,  but  may  behave  arbitrarily  when  the  promise  A(x)  is  not 
satisfied.  Similarly,  M  is  a  decision  procedure  for  Kq(p  when  restricted  to 
points  satisfying  the  “promise”  'll),  but  may  behave  rather  arbitrarily  on  the 
remaining  points.  The  difference  between  a  solution  to  a  promise  problem 
and  suck  a  test  M  is  that  M  is  required  to  be  a  sound  test  for  K^ip  even 
when  V*  fails  to  hold. 

We  define  knowledge  of  a  fact  ip  given  tf)  as  follows.  Given  a  system  R, 
we  say  that  "q  kno'ws  ip  given  if)^  at  a  point  c,  denoted  by  c  |=  K^<p,  iff 

1.  c  hV*. 

2.  c\=  K^ip,  and 

3.  there  is  a  BPP  test  for  K^<p  given  tj)  in  R. 

The  last  two  conditions,  as  in  the  definition  of  BPP  knowledge,  require  that 
q  actually  knows  <p  and  that  there  exists  a  feasible  test  M  for  K^ip  that  is 
sound  in  general,  and  complete  given  'll).  The  first  condition  says  knowledge 
given  "0  holds  only  at  points  satisfying  0.  Intuitively,  these  points  are  the 
only  points  of  interest  since  these  are  the  only  points  where  the  promise  0  is 
true,  the  only  points  where  q  has  learned  the  information  suflicient  for  q  to 
correctly  determine  whether  it  know  ip.  The  fact  that  different  tests  M  are 
allowed  to  behave  differently  at  points  failing  to  satisfy  0  is  another  reason 
we  must  require  that  Kfip  hold  only  at  points  satisfying  0:  we  want  K^ip 
to  be  well-defined  at  all  points,  even  points  failing  to  satisfy  0  where  the 
required  behavior  of  our  tests  M  is  only  loosely  specified. 

To  understand  the  relationship  between  this  definition  of  knowledge  and 
resource-bounded  knowledge,  notice  that  if  0  is  the  fact  true,  then  K*tp 
is  equivalent  to  In  this  sense,  knowledge  given  a  fact  'll)  is  a.  direct 

generalization  of  resource-bounded  knowledge.  Furthermore,  notice  that  if 
0  is  testable  in  BPP  given  only  agent  q's  local  state  as  input,  then  K^ip  is 
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equivalent  to  A  ■0)-  In  general,  however,  we  do  not  restrict  the  facts 

0  to  be  testable  in  BPP,  and  in  this  case  it  does  not  appear  that  knowledge 
given  0  can  be  captured  directly  in  terms  of  BPP-knowledge. 

To  see  how  this  notion  of  knowledge  enables  us  to  capture  our  intuition 
concerning  .learning,  let  us  return  to  our  initial  example  in  which  an  agent 
q’s  local  state  includes  two  integer-valued  variables  m  and  n.  Let  (p  be  the 
fact  that  m  is  a  quadratic  residue  modtdo  n,  and  let  0  be  the  fact  that  the 
factorization  of  n  is  explicitly  given  in  the  messages  on  9’s  comm.unication 
tape.  Let  M  be  the  test  that  accepts  iff  the  factorization  of  n  is  explicitly 
given  in  the  messages  on  q’s  communication  tape  and  m  is  a  quadratic  residue 
modulo  n.  This  test  M  for  (p  is  cleaxly  sound  and  clearly  complete  given  0. 
Thus,  when  q  learns  from  the  factorization  of  n  on  its  communication  tape 
that  (p  is  true,  then  q  does  indeed  know  ip  given  0. 

We  note,  however  that  while  the  intuition  motivating  the  definition  of 
Kfip  is  that  0  is  some  additional  information  an  agent  might  obtain  that 
will  enable  it  to  determine  whether  it  knows  the  definition  of  K^tp  is 
more  general  than  this.  Suppose,  for  example,  that  0  is  the  fact  that  the 
prover  in  an  interactive  proof  is  the  good  prover.  Intuitively,  given  that  the 
verifier  is  talking  to  the  good  prover,  the  veiifier  knows  x  ^  L  when  it  rejects. 
The  fact  0,  however,  is  a  fact  whose  truth  can  never  be  determined  given 
only  the  verifier’s  local  state,  and  hence  doss  not  represent  some  information 
the  verifier  might  somehow  be  able  to  learn,  and  therefore  determine  that  it 
knows  X  ^  L.  In  this  case,  the  right  way  to  view  0  is  not  as  a  fact  the  verifier 
CM  learn,  but  as  a  condition  or  “promise”  whose  truth  guarantees  that  the 
verifier’s  test  M  accurately  determines  whether  it  knows  tp. 

Finally,  because  the  behavior  of  a  test  M  is  relatively  unrestricted  when 
the  condition  0  is  false,  and  because  an  agent  may  not  be  able  to  determine 
whether  0  is  true  or  false,  an  important  question  is  how  an  agent  q  is  to 
interpret  the  result  of  running  the  test  M.  What  meaning  should  q  assign 
to  the  probability  with  which  M  accepts?  Notice  that  M  can  accept  either 
with  probability  less  than  1/3  or  with  probability  greater  than  1/3  (and, 
in  particular,  with  probability  greater  than  2/3).  In  the  latter  case,  M’s 
soundness  guarantees  to  q  that  Kg<p  must  hold,  since  M  would  accept  with 
probability  less  than  1/3  if  Kq(p  did  not  hold.  On  the  other  hand,  q’s  ability 


to  a-ssign  meaning  to  M’s  accepting  with  probability  less  that  1/3  depends 


on  9’s  ability  to  deternaine  whether  0  is  true.  If  it  can  determine  that  0  is 


true,  then  it  is  guaranteed  that  ~'Kq(p  holds.  Otherwise,  the  test  M  gives  q 
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no  useful  information  about  whether  it  does  or  does  not  know  (p. 

This  discussion  illustrates  the  asymmetry  of  the  definition  of  K^(p.  In 
particular,  since  the  test  M  may  say  K^(p  does  not  hold  when  in  fact  it  does 
(this  can  happen  at  a  point  failing  to  satisfy  if)),  the  tests  associated  with 
K^(p  feel  more  like  tests  for  Kq(p  than  they  do  tests  for  ~'K^<p.  It  seems, 
however,  that  positive  tests  about  knowledge  tend  to  be  more  important 
that  negative  tests  in  the  context  of  cryptography.  In  the  case  of  zero  knowl¬ 
edge,  for  example,  our  intuition  does  not  say  that  the  verifier  does  not  know 
a  fact  ip  at  the  end  of  a  proof  of  x  €  i,  but  rather  that  if  the  verifier  does 
know  (p  at  the  end  of  a  proof,  then  it  also  knows  p  at  the  beginning.  No¬ 
tice  that  proving  a  polynomial-time  agent  does  not  know  a  fact  (say  a  fact 
it  knows  in  the  information-theoretic  sense)  would  probably  involve  proving 
something  about  issues  involving  P  versus  NP.  On  the  other  hand,  prov¬ 
ing  positive  statements  about  a  polynomial- time  agent’s  knowledge  involves 
the  construction  of  polynomial-time  tests,  which  is  typically  a  much  more 
tractable  task.  This  probably  explains  the  prevalence  of  positive  statements 
about  knowledge  in  cryptography. 

Practical  Knovirledge 

The  definition  of  practical  knowledge  itself,  the  ultimate  objective  of  this 
section,  is  obtained  as  a  result  of  the  following  observation:  a  probabilistic 
test  that  fails  on  a  negligible  portion  of  its  inputs  is  typically  considered  to 
be  just  as  good  as  one  that  never  fails.  Similarly,  in  the  context  of  zero 
knowledge,  the  fact  that  the  distributions  of  (P(5),  V^*(t))(x)  and  My{t,x) 
can  be  distinguished  by  a  polynomial-time  test  with  only  negligible  proba¬ 
bility  is  considered  to  be  just  as  good  as  if  the  two  distributions  cannot  be 
distinguished  at  all.  The  soundness  and  completeness  conditions  required  by 
the  definition  of  knowledge  given  if),  however,  do  not  allow  for  the  possibility 
that  a  given  test  M  might  fail  to  be  sound  or  complete  at  a  negligible  frac¬ 
tion  of  the  points  where  we  weint  it  to  be  sound  or  complete.  It  is  natural 
to  consider  relaxing  these  conditions  in  some  way.  In  order  to  do  this,  we 
must  first  determine  how  we  are  going  to  go  about  measuring  the  size  of  the 
set  of  points  where  the  test  M  fails.  Since  the  only  distribution  available 
during  probabilistic  computation  is  the  distribution  on  runs  induced  by  the 
coins  tossed  during  the  runs,  it  seems  most  natural  to  require  that  the  test 
behaves  correctly  at  all  points  of  all  but  a  negligible  fraction  of  the  runs. 
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Formally,  let  init  be  the  fact  holding  only  at  points  at  the  beginning  of 
a  run  (that  is,  at  time  0  points).  Given  a  system  R,  we  say  that  M  is  a 
practically  sound  test  for  <p  if  for  all  k  there  exists  a  such  that 

R  1=  init  D  Pr{D30und{M,  K^tp))  >  1  —  a  |®|~* . 

Similarly,  given  a  fact  we  say  M  is  a  practically  complete  test  for 
given  'll)  if  for  all  k  there  exists  a  such  that 

R  [=  init  D  Pr(D[V’  D  complete{M,  Kqip)])  >  1  —  a  |x|“* . 

Notice  that,  since  we  want  to  consider  tests  that  behave  correctly  on  all  but  a 
small  fraction  of  the  runs,  we  have  used  the  antecedent  init  in  the  definition  of 
practical  soundness  and  practical  completeness  to  ensure  that  the  probability 
is  being  taken  over  the  runs  of  the  system.  These  definitions  cire  equivalent 
to  saying  that  for  every  initial  global  state  of  the  system,  the  conditions 
sound{M,  Kqip)  and  ■0  D  complete{M,  Kqip)  hold  at  all  points  of  almost  all 
runs  extending  this  initial  global  state.  That  is,  these  conditions  are  state¬ 
ments  about  prior  probabilities.  We  coidd  have  considered  instead  tests  with 
the  stronger  property  that  they  behave  correctly  at  all  but  a  small  fraction 
of  the  points  extending  any  given  global  state  (by  deleting  the  antecedent 
init).  This  latter  notion  can  lead  to  dramatically  different  results  (recall  the 
anedysis  of  the  probabilistic  coordinated  attack  problem  given  in  Chapter  4), 
but  does  not  seem  appropriate  for  most  computer  science  applications.  In 
particular,  it  does  not  seem  appropriate  in  the  context  of  interactive  proofs: 
at  a  point  where  the  verifier  has  already  accepted,  it  no  longer  makes  sense 
to  expected  the  verifier  to  reject  with  high  probability,  even  when  x  ^  L. 

We  now  define  “g  practically  knows  ip  given  ij)”  at  a  point  c,  which  we 
denote  by  c  [=  Kftp,  in  precisely  the  same  way  as  we  defined  “g  knows  tp  given 
0,”  except  that  the  soundness  and  completeness  conditions  are  replaced  by 
practical  soundness  and  practical  completeness.  Formally,  c  |=  Kfip  iff 

1.  c  1=0, 

2.  c  [=  Kqip,  and 

3.  there  is  a  test  M  that  is  practically  sound  for  K^ip  and  practically 
complete  for  K^ip  given  0. 
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The  tilde  in  the  notation  K^(p  is  intended  to  denote  the  approximate  nature 
of  the  tests  M  guaranteed  by  the  definition  of  practical  knowledge.  To  say 
that  an  agent  practically  knows  (p  given  ip,  therefore,  means  that  the  agent 
knows  tp  and  has  a  test  that  quite  accurately  determines  whether  it  knows 
tp  at  points  satisfying  ip,  although  on  rare  occasions  (that  is,  in  a  negligible 
f-action  of  the  runs)  it  may  make  mistakes. 

Alternate  definitions 

As  we  have  mentioned,  there  are  several  alternatives  to  the  definition  of 
practical  knowledge.  Before  proceeding  to  show  how  practical  knowledge  can 
be  used  to  analyze  interactive  and  zero  knowledge  proof  systems,  we  discuss 
several  of  these  alternatives.  The  reader  interested  only  in  the  application 
of  practical  knowledge  to  interactive  and  zero  knowledge  proof  systems  can 
safely  skip  ahead  to  the  beginning  of  the  next  section. 

Recall  once  again  the  intuition  motivating  the  definition  of  practical 
knowledge:  as  a  result  of  learning  the  fact  ip  that  m  =  (n  —  1)^  (mod  n), 
an  agent  can  deduce  that  it  knows  the  fact  tp  that  m  is  a  quadratic  residue 
modulo  n.  Notice  that  in  this  case  the  fact  ip  is  actually  a  proof  of  the  fact  (p. 
In  general,  knowing  a  proof  of  a  fact  (p  is  equivalent  to  knowing  a  stronger 
fact  Ip  that  implies  (p.  Thus,  since  ip  is  presumably  easy  to  verify  and  p 
is  not,  instead  of  talking  about  knowing  p,  we  could  talk  about  knowing  ip 
(and  hence  p)  instead.  But  this  is  not  very  satisfactory.  Returning  to  our 
quadratic  residuosity  example,  what  interests  us  is  whether  the  agent  knows 
this  fact  p  that  m  is  a  quadratic  residue  modulo  n,  and  not  the  particular 
proof  of  p  the  agent  knows.  We  want  to  be  able  to  describe  protocols  in  terms 
of  knowledge,  such  as  “if  q  knows  p,  then  q  should  halt  and  accept”  If  all 
we  can  talk  about  are  the  various  proofs  ip  of  p,  however,  then  we  are  forced 
to  describe  this  protocol  indirectly  with  “if,  for  any  proof  ip  of  p,  agent  q 
knows  Ip,  then  q  should  halt  and  accept.”  Such  descriptions  seem  much  less 
desirable  than  the  first. 

To  avoid  this  problem,  one  might  be  tempted  to  define  a  notion  of  learning 
in  which  an  agent  learns  p  at  a  point  if  at  this  point  it  BPP-knows  a  fact  ip 
that  implies  p,  implicitly  existentially  quemtifying  over  all  possible  proofe  ip 
of  p.  Unfortunately,  this  notion  of  learning  is  not  very  useful  to  a  resource- 
bounded  agent.  It  could  be,  for  example,  that  at  every  point  c  the  agent  BPP- 
knows  a  different  fact  implying  p  (and  hence  has  “learned”  p  everywhere) 
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and  yet  is  unable  to  determine  at  a  particular  point  which  fact  '0c  it  should 
test  for  in  order  to  determine  that  it  knows  (p. 

Another  approach  one  might  be  tempted  to  tsdce  is  to  define  a  notion  of 
knowing  ip  with  respect  to  a  particular  test  M  where,  inform£illy,  ^  agent^ 
knows  (p  with  respect  to  M  if  using  the  test  M  the  agent  can  detenr’ue 
that  it  knows  (p.  We  remark  that  Fischer  and  Zuck  define  a  similar  notion 
of  knowledge  in  [FZ87],  but  based  on  RP  tests  instead  of  BPP  tests.  No¬ 
tice,  however,  that  in  some  sense  this  idea  is  very  similar  to  BPP-knowing 
a  particular  proof  0  of  since  we  can  always  take  the  proof  0  to  be  the 
fact  that  M  accepts  with  high  probability  (and  hence  tells  us  that  ip  holds). 
This  approach  consequently  shares  the  disadvantages  discussed  above.  On 
the  other  hand,  instead  of  being  forced  to  quantify  over  all  possible  proofs  0 
of  p  when  describing  protocols  as  we  did  above,  we  are  now  forced  to  quan¬ 
tify  over  all  proofs  0  and  all  tests  M  verifying  such  proofs,  compounding  our 
original  complaint.  Most  important,  however,  we  want  to  be  able  to  specify 
and  analyze  protocols  in  terms  of  knowledge  precisely  because  we  want  to  be 
able  to  abstract  away  the  particular  tests  being  used  when  we  think  about 
computation.  We  note  that  the  definition  of  resource-bounded  knowledge 
already  existentially  quantifies  over  such  tests  (so  these  tests  do  not  appear 
in  the  notation  used),  and  we  do  not  want  to  reintroduce  them  here. 

The  reader  may  still  wonder  about  the  asymmetry  of  our  definition.  Why 
do  we  require  soundness  at  all  points,  but  completeness  only  at  points  satisfy¬ 
ing  0?  Notice  that  if  we  strengthen  the  definition  to  require  both  soundness 
and  completeness  at  all  points,  then  we  have  essentially  returned  to  the  defi¬ 
nition  of  BPP  knowledge.  On  the  other  hand,  suppose  we  weaken  the  defini¬ 
tion  to  require  soundness  only  at  points  satisfying  0.  If  0  is  easily  testable, 
then  such  a  notion  of  knowledge  may  be  of  interest.  As  we  have  mentioned, 
however,  we  want  to  be  able  to  consider  facts  0  that  are  not  easily  testable, 
jind  in  this  context  this  weedcening  of  our  definition  becomes  rather  uninter¬ 
esting.  For  in  contrast  to  our  definition,  g’s  ability  to  zissign  any  meaning  to 
M’s  probability  of  acceptance  would  now  depend  on  g’s  ability  to  determine 
whether  0  is  true,  which  makes  M  of  little  use  if  testing  for  0  is  hard.  We 
could  instead  have  required  completeness  at  edl  points  and  soundness  only  a 
points  satisfying  0,  but  this  would  change  the  flavor  of  M’s  beha'vior  from 
being  primarily  a  test  for  Kqp  to  being  a  test  for  which  (as  we  have 

said)  seems  less  relevant  in  the  context  of  cryptography. 

Finally,  we  note  that  in  an  earlier  version  of  this  work  [HMT88]  we  defined 
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knowledge  with  respect  to  sets  of  points  A  instead  of  defining  knowledge  with 
respect  to  facts  i>.  Intuitively,  the  set  A  consisted  of  the  points  in  the  system 
(for  examplej  the  points  satisfying  some  fact  where  an  agent  has  obtained 
enough  information  to  be  able  to  determine  whether  it  knows  a  fact  <p.  The 
primary  disadvantage  of  this  way  of  defining  knowledge  is  that  the  logic  of 
knowledge  used  to  analyze  a  system  is  no  longer  independent  of  the  system 
being  analyzed.  It  is  no  longer  possible,  for  example,  to  argue  that  since  the 
formula  (p'  D  Kf(p  is  valid  in  one  system,  it  is  valid  in  a  second.  Instead 
we  must  argue  that  since  a  formula  like  tp'  D  K^<p  is  valid  in  one  system,  a 
formula  like  tp'  D  Kg<p  is  valid  in  a  second  for  some  set  B  of  points  related 
to  the  set  A  in  some  way  that  must  be  explicitly  specified.  Introducing 
such  sets  of  points  into  our  logic  results  in  losing  the  abstraction  from  the 
operational  nature  of  the  system  being  studied  that  motivated  us  to  avoid 
defining  knowledge  with  respect  to  particular  tests  M  in  the  first  place. 


5.4  Knowledge  and  Interactive  Proofs 

We  now  return  to  the  study  of  knowledge  and  interactive  proof  systems.  No¬ 
tice  that  the  cryptographic  definition  of  an  interactive  proof  system  really 
has  nothing  to  do  with  knowledge  or  computational  complexity.  It  is  simply 
a  statement  about  probability.  It  is  not  surprising,  therefore,  that  we  can 
immediately  translate  the  statements  of  soundness  and  completeness  in  the 
definition  of  an  interactive  proof  system  directly  into  our  language  of  prob¬ 
ability.  Recall  that  init  is  the  fact  holding  only  at  points  at  the  beginning 
of  a  run  (that  is,  time  0  points),  and  let  accept  be  the  fact  holding  only  at 
points  at  which  the  verifier  has  accepted. 

Proposition  5.1:  An  interactive  protocol  (P,  V)  is  an  interactive  proof  sys¬ 
tem  for  a  language  L  iff  the  following  conditions  are  satisfied: 

•  Completeness:  For  every  k>  I  there  exists  a  >  1  such  that 

P  xV  \=  init  D  Pr[x  €  L  D  Oaccept]  >l~ct  1®]“*' . 


-  u  ^ 


•  Dounaness:  rot  every  n 


1  Al. - -  ^  'S. 


1 


■p  X  V  t=  init  D  Pr(Oaccept  D  ®  €  L]  >  1  —  a  1®|  * . 
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The  proof  of  Proposition  6.1  (and  all  other  results  in  this  chapter)  can  be 
found  in  Appendix  5. A.  The  constajit  a  used  above  is  necessary  due  to  the 
fact  that  the  probabilistic  guarantees  made  by  the  definition  of  an  interactive 
proof  system  hold  only  for  sufficiently  large  x.  Notice  that  if  1  —  a  is 
negative,  then  Pr{(p)  >  1  —  a  |a!|~*  is  equivalent  to  Pr{<p)  >  0,  which  is  veilid 
for  every  fact  (p.  Consequently,  by  choosing  a  so  that  1  —  a  |x|“*  <  0  for 
insufficiently  large  x  we  obtain  a  formula  holding  for  all  x,  and  hence  valid 
at  all  points  of  the  system.  While  this  constant  a  does  not  appear  in  the 
formal  definition  of  an  interactive  proof  system,  an  equivalent  definition  of 
interactive  proof  systems  can  be  formulated  making  use  of  such  constants 
just  as  we  do  in  Proposition  5.1. 

According  to  Proposition  5.1,  a  formula  such  as  Pr[x  €  LD  Oaccept]  > 
1  —  a  |x|“*  holds  at  time  0  but  not  necessarily  at  later  points.  After  the 
verifier  has  rejected,  for  example,  it  is  clearly  not  the  case  that  with  high 
probability  the  verifier  will  eventually  accept.  In  general,  even  before  the 
verifier  has  actually  decided  to  accept  or  reject,  a  particularly  bad  sequence  of 
coin  Hips  can  significantly  lower  the  verifier’s  chances  of  eventually  accepting. 
Consequently,  the  antecedent  inti  is  crucial  in  the  formulas  above.  Intuitively, 
this  is  due  to  the  fact  that  the  verifier’s  probability  space  is  changing  with 
every  step.  Since  we  have  chosen  the  assignment  as  the  basis  for  our 
definition  of  probabilistic  knowledge,  an  assignment  associating  with  a  point 
the  set  of  points  having  the  seime  global  state,  an  agent’s  probability  space 
decreases  in  size  with  every  step.  The  seime  would  often  be  true  if  we  had 
chosen  any  other  consistent  assignment  such  as  or 

Since  the  facts  appearing  in  Proposition  5.1  are  valid,  all  agents  know 
these  facts  at  all  points.  Furthermoia,  all  agents  know  the  fact  init  whenever 
it  holds.  Since  from  K^init  and  K^{inH  D  i>)  we  can  deduce  we  can 
immediately  deduce  the  following  corollary  to  Proposition  5.1. 

Corollary  5.2:  An  interactive  protocol  (P,  V)  is  an  interactive  proof  system 
for  a  language  L  iif  the  following  conditions  are  satisfied: 

•  Completeness:  For  every  k>l  there  exists  a  >  1  such  that 

PxV\=  inU  D  F;^-“'*I"‘(x  G  X  D  Oaccept). 

•  Soundness:  For  every  fc  >  1  there  exists  a>l  such  that 

VxV\=  init  D  (Oaccept  D  x  e  L). 
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This  corollary  says  that  (P,  V )  is  complete  if  both  the  good  prover  and  the 
good  verifier  know  with  high  probability  that  if  ®  €  i,  then  the  good  prover 
will  convince  the  good  verifier  to  accept;  and  (P,  V)  is  sound  if  the  good 
verifier  knows  with  high  probability  that,  no  matter  what  protocol  the  prover 
is  running,  if  the  verifier  accepts  x  then  x  £  L. 

One  important  difference  to  notice  between  the  two  statements  is  that 
completeness  is  stated  with  respect  to  the  system  P  x  V  consisting  of  the 
good  prover  and  the  good  verifier,  while  soundness  is  statement  with  respect 
toVxV  consisting  of  arbitrary  provers  and  the  good  verifier.  In  the  a  system 
P*  X  V,  the  prover  P*  is  fixed  and  hence  the  verifier  knows  which  prover  it 
is  talking  to.  In  the  system  V  xV,  however,  the  verifier  may  consider  any 
prover  possible,  and  hence  cannot  know  the  identity  of  the  prover.  In  this 
way  we  are  able  to  capture  quite  simply  the  intuition  that  the  verifier  can  be 
confident  that  x  £  L  whenever  it  accepts,  regardless  of  which  prover  it  has 
been  talking  to. 

A  second  observation  worth  making  here  is  that  if  (P,  V)  is  sound,  then  it 
is  actually  the  case  that  (in  addition  to  the  verifier)  every  prover  also  know.-, 
with  high  probability  that  x  £  L  whenever  the  verifier  accepts;  that  is,  we 
could  have  replaced  by  in  the  statement  of  soundness 

above.  We  have  chosen  to  formulate  this  statement  in  terms  of  the  verifier’s 
knowledge  since  our  intuition  says  that  soundness  is  intended  to  be  primarily 
a  guarantee  to  the  verifier  (just  as  zero  knowledge  is  intended  to  be  primarily 
a  guarantee  to  the  prover). 

While  Corollary  5.2  shows  that  it  is  possible  to  chwacterize  interactive 
proof  systems  in  terms  of  knowledge  and  probability,  this  characterization  is  a 
reformulation  of  the  original  cryptographic  definition  in  terms  of  very  similar 
concepts.  It  does  not  significantly  clarify  our  intuition  concerning  interactive 
proof  systems,  other  than  making  explicit  this  distinction  between  what  is 
intended  to  be  a  guarantee  to  the  prover  and  what  is  a  guarantee  to  the 
verifier.  It  does  not  capture,  for  example,  the  intuition  that  at  the  end  of  an 
interactive  proof  of  -u  £  L  with  the  good  prover,  the  good  verifier  knows  that 
X  £  L  despite  its  limited  computational  power. 

In  what  way  can  the  verifier  be  said  to  know  whether  x  £  L  hi  the 
end  of  a  proof  oi  x  £  LI  If  our  intuition  is  correct,  the  verifier  knows 
X  £  L  whenever  it  accepts.  Consider  the  test  M  that  takes  as  its  input  the 
verifier’s  local  state  and  accepts  at  a  point  if  the  verifier  has  accepted  at  that 
point  and  rejects  otherwise.  Loosely  speaking,  the  soundness  condition  for 
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an  interactive  proof  implies  that  M  will  not  accept  when  x  ^  L,  and  the 
completeness  condition  implies  that  M  will  accept  when  ®  G  i  if  M  is  run 
at  the  end  of  a  proof  with  the  good  prover.  Let  us  denote  by  halted  the  fact 
holding  at  a  point  iff  at  that  point  the  verifier  hits  either  accepted,  rejected, 
or  otherwise  halted.  We  refer  to  a  point  satisfying  halted  as  a  final  point.  Let 
us  denote  by  ‘p  running  P’  the  fact  holding  at  a  point  iff  at  that  point  the 
prover  is  following  the  protocol  P.  Let  ^  be  the  fact  halted  A  ‘p  running  P\ 
Intuitively,  we  would  like  to  say  that  the  good  verifier  knows  x  €  Z  given  i{) 
at  the  end  of  a  proof  ofx  E  L  with  the  good  verifier.  Of  course,  the  test  M  is 
not  a  sound  test  for  x  €  Z  since  on  rare  occasions  the  verifier  may  incorrectly 
accept  when  x  ^  L,  and  M  is  not  complete  given  ip  for  similar  reasons.  On 
the  other  hand,  it  is  practically  sound  and  is  practically  complete  given  ip.  As 
a  consequence,  we  can  prove  the  following. 

Proposition  5.3:  If  (P,  V)  is  an  interactive  proof  system  for  Z,  then 

V  y  V  \=  (x  E  L  A‘p  running  P’)  D  Okf{x  E  Z), 
where  ip  ^  halted  A  ‘p  running  P\ 

In  fact,  we  can  essentially  prove  a  converse  of  this  proposition  as  well, 
which  shows  that  we  can  characterize  the  notion  of  an  interactive  proof  sys¬ 
tem  using  practical  knowledge. 

Proposition  5.4;  If 

VxW\={xELA‘p  running  P’)  D  Okf{x  E  Z), 


where  ip  halted  A  ‘p  running  P\  then  we  can  effectively  modify  V*  to 
obtain  V  such  that  (P,  V)  is  an  interactive  proof  system  for  Z. 


The  protocol  V  is  simply  the  protocol  V*  at  the  end  of  which  the  verifier 
uses  its  test  for  practic^ll  knowledge  of  x  €  Z  to  decide  whether  to  accept  or 
reject. 

These  results  tell  us  that  an  interactive  proof  system  for  Z  is  precisely 
one  that  guarantees  that  the  venjfier  will  practically  know  x  G  Z  at  the  end 
of  a  proof  of  X  G  Z  with  the  good  prover,  and  will  practically  never  be  fooled 
(by  any  prover).  We  remark  that,  having  reformulated  the  cryptographic 
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definition  of  an  interactive  proof  system  in  terms  of  our  logic  of  knowledge 
and  probability  (recall  Proposition  5.1),  the  proof  of  this  new  characteriza¬ 
tion  of  interactive  proof  systems  has  been  done  entirely  by  reasoning  about 
formulas  in  our  logic  of  knowledge  and  probability.  We  consider  this  to  be 
quite  important,  since  one  of  the  major  reasons  for  studying  cryptography  in 
terms  of  knowledge  is  to  be  able  to  reason  at  a  semantic  level  about  crypto¬ 
graphic  systems  without  delving  into  the  (often  complex)  operational  nature 
cryptographic  definitions  and  computation. 


5.5  Knowledge  and  Zero  Knowledge 

We  now  turn  our  attention  to  zero  knowledge  proof  systems,  and  show  how 
to  capture  the  intmtion  that  if  the  verifier  knows  a  fact  (p  at  the  end  of  a 
zero  knowledge  proof  of  a;  €  i,  then  the  verifier  knows  x  G  L  D  ip  si  the 
beginning  of  the  proof  as  well.  Since  this  inttiition  requires  that  <p  be  true 
at  the  beginning  of  a  proof  whenever  it  is  true  at  the  end  of  a  proof,  it  must 
be  a  fact  that  depends  only  on  the  information  contained  in  the  initial  state 
and  cannot  be  a  fact  like  “the  proof  is  over.”  Recall  that,  given  a  system 
R,.  a  fact  (p  is  said  to  be  a  fact  about  the  initial  state  if  (r,  m)  [=  y?  implies 
(r',m')  1=  (p  for  all  points  (r',m')  in  R  with  r(0)  =  r'(0).  That  is,  v’  is  a 
fact  about  the  initial  state  if  the  truth  of  ^  at  a  point  of  a  run  depends  only 
on  the  run’s  initial  state.  Restricting  our  attention  to  facts  about  the  initial 
state  is  not  much  of  a  restriction  in  practice  since  we  are  typically  concerned 
that  the  prover  will  leak  some  information  about  the  common  input  x  to  the 
verifier,  and  any  fact  about  x  is  in  particular  a  fact  about  the  initial  state 
(since  x  is  encoded  in  the  initial  state). 

The  following  theorem  captures  the  intuition  mentioned  above.  Roughly 
speaking,  it  says  that  if  x  G  L  and  the  verifier  has  a  nontrivial  chance  of 
learning  a  fact  <p  at  the  end  of  a  proof  of  ®  €  L,  then  the  verifier  can 
already  deduce  tp  from  x  G  L  on  its  own  at  the  beginning  of  the  proof 
without  interacting  with  the  prover.  Consequently,  provided  x  G  L,  the  only 
information  that  a  prover  leaks  to  the  verifier  in  a  zero  knowledge  proof  of 
X  G  L  are  facts  that  follow  firom  x  G  L.  In  this  sense,  the  verifier  lesurns 
essentiallj’^  notliing  as  a  result  of  the  proof  other  than  the  fact  x  G  L  the 
prover  set  out  to  prove.  However,  the  proviso  that  x  G  L  is  crucial  here. 
There  is  nothing  in  the  definition  of  a  zero  knowledge  proof  to  stop  the 
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prover  from  leaking  all  sorts  of  information  when  x  ^  L. 

Theorem  5.5:  Let  (P,  V)  be  a  zero  knowledge  proof  system  for  L,  let  V* 
be  an  arbitrary  verifier,  and  let  ^  be  a  fact  about  the  initial  state.  For  every 
fact  and  constant  A  >  1  there  is  a  fact  i}>*  and  a  constant  a  >  1  such  that 

PxV\={xeLA  init)  D  KfU  SLD  ^)]. 


The  statement  of  this  theorem  is  one  of  the  major  motivations  for  the 
definition  of  practical  knowledge.  We  want  to  capture  the  idea  that  if  the 
verifier  is  able  to  compute  something  on  its  own  as  a  result  of  obtaining 
some  extra  information  (represented  by  the  fact  ip)  from  the  prover  during 
the  course  of  a  proof,  then  the  verifier  is  already  able  to  compute  this  on  its 
own  at  the  beginning  of  the  proof.  BPP-knowledge  does  not  seem  to  let  us 
capture  this  intuition.  We  note,  however,  that  the  same  result  holds  when 
we  replace  practical  knowledge  given  ip  by  BPP-knowledge  given  but  this 
strengthening  of  the  hypothesis  (that  the  verifier  knows  <p  given  ip  at  the  end 
of  the  proof)  weakens  the  statement  of  the  theorem.  Furthermore,  the  char¬ 
acterization  of  interactive  proof  systems  in  terms  of  practical  knowledge  given 
by  Propositions  5.3  and  5.4  in  Section  5.3.2  indicates  thct  practical  knowl¬ 
edge  is  of  greater  relevance  to  interactive  proof  protocols.  Loosely  speaking, 
the  fact  Ip'  represents  the  condition  that  the  current  point  is  an  initial  point 
with  X  E  L,  and  that  from  this  initial  point  there  is  a  nonnegligible  chance 
that  Kf<p  will  hold  at  the  end  of  the  run.  The  test  fox  x  E  L  D  ip  that  the 
verifier  uses  at  such  points  essentially  runs  the  simulating  Turing  machine 
repeatedly  to  generate  local  histories  (since  x  E  L,  this  simulation  is  guaran¬ 
teed  to  be  quite  accurate),  and  runs  the  test  for  (p  at  the  end  of  each  of  these 
histories.  Since  this  test  will  succeed  at  the  end  of  a  nonnegligible  fraction 
of  these  histories,  by  generating  enough  of  them  the  verifier  is  almost  certain 
to  generate  one  such  history,  at  which  point  it  can  accept. 

Stepping  back  and  looking  at  the  statement  of  Theorem  5.5,  however,  we 
see  that  the  result  is  slightly  unsatisfactory.  The  reason  is  that  it  is  stated 
in  terms  of  the  system  P  xV*,  and  in  this  system  the  verifier’s  protocol  V* 
is  fixed  and  hence  known  to  the  prover.  In  contrast,  the  intuition  behind 
zero  knowledge  is  that  even  though  the  prover  does  not  know  the  identity  of 


the  verifier,  the  prover  knows  that  the  verifier  learns  nothing  at  the  end  of 


the  proof  other  than  x  E  L.  In  other  words,  our  intuition  suggrsts  that  the 


statement  of  Theorem  5.5  should  also  hold  in  the  system  P  x  V^. 
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Unfortunately,  we  cannot  prove  such  a  result.  Given  a  test  N  for  K„fp 
at  the  end  of  a  proof  of  x  €  £  in  the  system  P  xV*,  our  proof  of  Theorem 
5.5  constructed  a  test  M  for  K„{x  e  L  D  (p)  at  the  beginning  of  the  proof 
by  repeatedly  running  My*  to  generate  runs  of  P  x  V*  and  running  the  test 
N  at  the  end  of  the  generated  run.  In  order  to  do  the  same  thing  in  the 
system  P  x  V”*,  because  we  require  that  our  test  M  behave  correctly  at  all 
points  of  the  system,  M  must  fost  be  able  to  determine  the  identity  of  the 
simulating  Turing  machine  My  given  the  identity  of  the  verifier’s  protocol 
V*.  But  since  the  order  of  quantification  in  the  definition  of  zero  knowledge 
guarantees  only  that  for  every  verifier  V*  there  is  a  Turing  machine  x) 

approximating  the  distribution  of  (P(s),  U'*(t))(x),  there  is  no  guarantee  that 
there  is  a  uniform  way  of  choosing  My .  This  is  a  rather  subtle  point  brought 
out  by  our  framework. 

Since  the  source  of  this  trouble  seems  to  be  the  nonuniformity  of  My*, 
a  natural  solution  is  simply  to  require  that  the  simulating  Turing  machine 
is  indeed  uniform  in  the  verifier’s  protocol;  that  is,  require  that  one  Turing 
machine  M  using  U*  as  a  subroutine  can  simulate  the  runs  of  (P,  F*)  for  every 
verifier  protocol  U*.  We  remark  that  most  known  zero  knowledge  protocols 
already  have  this  property.  This  property  is  captured  by  the  notion  of  black¬ 
box  zero  knowledge.  An  interactive  proof  system  (P,  V)  for  L  is  said  to 
be  strongly  black-box  zero  knowledge  (cf.  [Gre87])  if  there  is  a  probabilistic 
Turing  machine  M  such  that 

1.  M{V*,t,x)  runs  in  expected  time  polynomial  in  |x|,  and 

2.  the  families 

{(P(s),U*(<))(x) :  (x,a,t)  €  Dom}  and  {Mv.(t,x) :  (x,s,t)  €  Dom} 

are  polynomially  indistinguishable,  where  (x,  V*,  s,  t)  G  Dom  iff  x  G  i/, 
V*  is  a  possible  verifier  protocol,  s  is  a  possible  input  for  P,  and  <  is  a 
possible  input  for  y*. 

If  (P,  y)  is  a  strongly  black-box  zero  knowledge  proof  system  for  L,  then  we 
can  prove  the  analogue  of  Theorem  5.5  (with  virtually  the  same  proof)  in 
the  system  P  x  instead  of  P  xV*: 
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Theorem  5.6;  Let  {P,  V)  be  a  strongly  black-box  zero  knowledge  proof  sys¬ 
tem  for  L,  and  let  ^  be  a  fact  about  the  initial  state.  For  every  fact  ^  and 
constant  k>l  there  is  a  fact  i{)'  and  a  constant  a  >  1  such  that 

PxV^\=ixeLA  init)  D  Z>  kfix  SLD  <p)]. 

Unfortunately,  as  the  name  suggests,  the  notion  of  strongly  black-box 
zero  knowledge  is  stronger  than  one  might  expect  most  protocols  to  satisfy. 
The  problem  is  that  in  practice  M( V*,  t,  a;)  runs  V*  as  a  subroutine  on  input 
X.  Even  if  M  runs  V*  only  once,  the  running  time  of  M  is  at  least  as  great 
as  the  running  time  of  V*.  Consequently,  even  if  we  restrict  our  attention 
to  polynomial-time  V*  as  input  to  M,  since  the  polynomial  bound  on  the 
running  time  of  V*  is  different  for  every  V*,  the  running  time  of  M  will  not  be 
bounded  by  any  single  polynomial.  Oren  avoids  this  problem  in  his  definition 
of  black-box  zero  knowledge  by  charging  only  one  time  step  for  a  call  to  V*. 
Thus,  he  is  essentially  viewing  M  as  an  oracle  machine  (rather  than  a  purely 
polynomial-time  Turing  machine).  We  could  modify  our  definitions  to  allow 
for  knowledge  with  respect  to  oracle  machines,  but  a  more  natural  solution  is 
to  modify  the  measure  we  use  of  a  test’s  complexity.  In  particular,  suppose 
we  consider  tests  for  facts  that  run  at  a  point  (r,7n)  in  time  polynomial 
in  1*1,  the  running  time  of  V*,  and  the  description  of  V*,  where  r  is  a  run 
with  input  x  in  which  the  verifier  is  running  the  protocol  V*.  Then,  defining 
a  notion  of  practical  knowledge  with  respect  to  such  tests,  the  analogue  of 
Theorem  5.5  follows  with  i-recisely  the  same  proof.  We  note  that  all  zero 
knowledge  protocols  we  are  aware  of  satisfy  this  notion  of  black-box  zero 
knowledge. 


5.6  Generation  and  Zero  Knowledge 

In  the  previous  section  we  formalized  the  idea  that  the  verifier  in  a  zero 
knowledge  proof  learns  essentially  nothing  but  the  fact  the  prover  sets  out 
to  prove.  This  is  not,  however,  the  strongest  notion  of  security  one  could 
hope  for.  It  would  also  be  desirable  to  show  that,  as  a  result  of  interacting 
with  the  prover,  the  verifier  cannot  do  anything  that  it  could  not  do  before 
the  interaction.  As  mentioned  in  the  introduction,  for  example,  there  is  a 
big  difference  between  knowing  an  integer  n  is  composite  and  being  able  to 
generate  a  factor  of  n. 
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We  abstract  the  idea  of  the  verifier  being  able  to  do  something  as  knowing 
how  to  generate  a  y  such  that  R{x,y),  where  R  is  simply  a  bineiry  relation. 
For  example,  if  iZ(x,  y)  holds  precisely  when  y  is  a  prime  factor  of  a  number 
X  on  the  input  tape,  then  being  able  to  generate  a  y  such  that  iZ(x,  y)  means 
being  able  to  find  a  prime  factor  of  x.  Notice  that,  as  in  the  case  of  factoring, 
many  natural  relations  R  are  testable  in  BPP  given  both  x  and  y  as  input, 
even  though  generating  a  y  satisfying  R(x,  y)  given  only  x  as  input  may  be 
intractable.  The  assumption  that  a  relation  R  is  testable  in  BPP,  therefore, 
is  generally  not  a  severe  restriction.  Formally,  a  relation  R  is  testable  in 
BPP  if  there  is  a  probabilistic  algorithm  running  in  time  polynomial  in  |x|, 
accepting  (x,y)  with  probability  at  least  2/3  if  i2(x,y),  and  rejecting  (x,y) 
with  probability  2/3  if  -’i2(x,y). 

Just  as  we  have  said  that  the  verifier  knows  a  fact  (p  if  it  has  an  algorithm 
to  test  for  (p,  we  would  like  to  say  that  the  verifier  knows  how  to  generate 
a  y  satisfying  R{x,y)  if  it  has  an  algorithm  to  generate  such  a  y.  When 
defining  knowledge  of  facts,  we  have  considered  tests  for  facts  (p  that  were 
sound  and  were  correct  given  that  a  certain  other  fact  tit  was  true.  Here, 
although  there  are  no  conditions  analogous  to  soundness  and  completeness, 
we  consider  algorithms  that  do  a  “good  job”  of  generating  y’s  such  that 
iZ(x,  y)  at  points  satisfying  V’>  but  may  not  perform  as  well  at  other  points. 
Given  a  system  R,  we  say  that  a  probabilistic  algorithm  M  is  a  generator  for 
R  given  for  an  agent  q  if  for  every  point  (r,  m)  of  R 

1.  M  takes  as  input  g’s  local  state  Tq{m)  at  (r,7n), 

2.  M  runs  in  time  polynomial  in  |x|,  where  x  is  the  common  input  recorded 
in  r,(m),  and 

3.  if  M  outputs  a  string  y  then  R{x,y)  holds,  and  if  (r,m)  satisfies  rp  then 
M  outputs  such  a  string  with  probability  at  least  2/3. 

This  requirement  that  M  never  incorrectly  outputs  a  string  y  failing  to  satisfy 
iZ(x,  y)  is  easy  to  enforce  when  R  is  testable  in  BPP. 

Given  a  system  R,  we  say  that  the  verifier  knows  how  to  generate  a  y 

satisfying  R{x,y)  given  V*  at  a  point  c,  which  we  denote  by  c  [=  Gjf  y.R{x,y), 
•  ^ 
u 


1.  c  1=  V',  and 
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2.  there  is  a  generator  for  R  given  ^  for  v. 

Before  we  continue,  it  is  helpful  to  consider  the  relationship  between  this 
definition  of  knowing  how  to  generate  and  the  definition  of  knowing  a  fact. 
It  is  naturad  to  suppose  that  knowing  a  fact  can  be  characterized  in  terms 
of  knowing  how  to  generate.  For  example,  suppose  (p{x)  is  a  fact  about  «, 
and  suppose  R  is  the  relation  defined  by  R{x,  1)  if  (p{x)  is  true  and  R{x,  0) 
if  (p{x)  is  false.  Knowing  how  to  generate  a  y  such  that  R^x^y)  given  V* 
implies  knowing  ^>(35)  given  To  see  this,  suppose  JV  is  a  generator  for  R 
given  V*)  suppose  M  is  the  test  for  ^(s:)  that  accepts  at  a  point  iff  N 
outputs  1,  and  rejects  otherwise.  M  must  be  sound,  since  N  never  outputs 
an  incorrect  string  y,  and  hence  N  outputs  0  if  it  outputs  anything  at  all 
when  (p{x)  is  fzdse.  On  the  other  hand,  M  must  be  complete  given  ip,  since 
at  points  satisfying  ip  the  generator  N  outputs  1  with  probability  2/3  when 
ip{x)  is  true,  and  hence  M  accepts  with  probability  2/3.  But  what  about  the 
other  direction?  Does  knowing  <p(x)  given  ip  imply  knowing  how  to  generate 
a  y  satisfying  R{x,  y)  given  ipl  If  R  is  testable  in  BPP,  then  an  agent  actually 
knows  how  to  generate  a  y  satisfying  R{x,  y)  given  the  fact  true,  and  hence 
also  given  the  fact  ip.  But  if  R  is  testable  in  BPP,  then  so  is  v’(»)  aud 
hence  so  is  membership  in  the  language  L.  For  more  interesting  languages  L, 
namely  languages  not  contained  in  BPP,  it  seems  possible  that  an  agent  can 
know  <p{x)  given  ip  without  knowing  how  to  generate  a  y  satisfying  R{x,y) 
given  Ip.  In  other  words,  knowing  the  existence  of  a  proof  that  x  €  L  seems 
to  be  different  from  knowing  how  to  generate  a  proof  that  x  E  L.  Intuitively, 
the  reason  for  this  is  that  a  BPP  test  M  for  knowledge  of  ^(x)  given  ip 
is  allowed  to  make  mistakes,  whereas  a  generator  N  for  R{x,  y)  given  ip  is 
not.  For  example,  given  such  a  test  M,  suppose  we  try  to  construct  such 
a  test  N  in  the  obvious  way  by  having  N  output  1  if  Af  accepts  and  0 
otherwise.  M  can  reject  outright  at  any  point  not  satisfying  ip  regardless  of 
whether  ^(x)  is  true,  and  at  such  points  N  incorrectly  outputs  0.  We  note, 
however,  knowing  how  to  generate  is  most  interesting  in  contexts  other  than 
language  membership,  contexts  such  as  factorization  sketched  above,  and  in 
these  contexts  the  relations  R  are  testable  in  BPP. 

In  any  case,  we  can  prove  the  following  analogue  to  Theorem  5.5  (with 
virtually  the  same  proof): 

Theorem  5.7:  Let  (P,  V)  be  a  zero  knowledge  proof  system  for  L,  let  V* 
be  an  arbitrary  verifier,  and  let  R{x,y)  be  a  relation  testable  in  BPP.  For 
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every  fact  rj)  and  constant  k>l  there  is  a  fact  ij)'  and  a  constant  a  >  1  such 
that 

PxV\=ixeLA  init)  D  Ki-^^^^~'[OGt  y.R{x,y)  D  G*' y.R{x,y)]. 

Intuitively,  this  statement  says  that  if  the  verifier  has  a  nonnegligible  chance 
of  being  able  to  generate  a  y  satisfying  i2(x,  y)  by  talking  to  the  prover,  then 
the  verifier  can  generate  such  a  y  on  its  own.  We  note  that  this  theorem 
has  a  number  of  natural  extensions.  One  simple  extension  is  from  generat¬ 
ing  y’s  satisfying  relations  i2(»,y)  to  generating  y’s  satisfying  facts  about 
the  verifier’s  entire  initial  state.  Another  simple  extension,  along  the  lines 
of  practical  knowledge,  is  a  notion  of  practically  knowing  how  to  generate, 
denoted  by  G^2/.i2(x,y),  where  the  algorithm  may  on  a  small  fcaction  of  the 
points  satisfying  V'  fall  to  generate  y  such  that  R{x,y).  A  final  extension, 
using  black-box  zero  knowledge,  allows  us  to  prove  an  analogous  result  in  the 
system  P  x  V”*. 

We  note  that  the  ability  to  test  the  relation  R  in  BPP  is  crucial  to  the 
proof  of  Theorem  5.7.  Recall  that  in  the  proof  of  Theorem  5.5  the  verifier 
tests  for  the  fact  <p  by  repeatedly  generating  runs  and  testing  for  (p  at  the 
end  of  each  run.  Since  this  test  for  <p  is  sound,  the  verifier  can  accept  aa  soon 
as  this  test  for  (p  accepts.  Here,  however,  since  there  is  no  notion  analogous 
to  soundness,  the  verifier  has  no  way  of  knowing  which  of  the  many  y’s  it 
generates  satisfies  R{x,y)  and  should  be  output  unless  the  relation  iE(x,y) 
can  be  tested  in  BPP.  As  we  have  said,  however,  most  relations  R  of  interest 
are  testable  in  BPP. 

Finally,  we  note  that  our  definition  of  knowing  how  to  generate  given  tjj  is 
somewhat  similar  to  the  definition  of  probabilistic  relative  knowledge  defined 
in  [FZ87].  The  only  significant  difference  is  that  they  define  knowing  how 
to  generate  relative  to  a  particular  Turning  machine  M,  whereas  we  define 
knowing  how  to  generate  relative  to  a  fact  iff.  Roughly  speaking,  taking  V'Af 
to  be  the  fact  true  at  points  where  the  test  M  outputs  with  probability  2/3  a 
y  satisfying  R{x,  y),  knowing  how  to  generate  relative  to  M  and  knowing  how 
to  generate  given  V'at  coincide.  The  natural  generalization  of  our  definition  to 
practically  knowing  how  to  generate  (where  we  allow  the  generator  to  make 
mistakes,  but  only  on  a  negligible  fraction  of  the  runs)  differs  in  subtle  ways, 
however,  from  the  generalization  given  by  Fischer  and  Zuck. 
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5.7  Resource-bounded  provers 

In  an  interactive  proof  system  as  defined  in  [GMR89],  the  prover  is  assumed 
to  be  infinitely  powerful.  In  practice,  however,  a  prover  is  not  infinitely 
powerful  and  may  have  no  more  computational  power  than  the  verifier.  For¬ 
tunately,  a  probabilistic,  polynomial-time  prover  with  some  “secret  informa¬ 
tion”  on  its  work  tape  is  able  to  carry  out  many  of  the  interesting  interactive 
protocols.  In  the  case  of  the  graph  isomorphism  protocol  from  [GMW86]  dis¬ 
cussed  in  the  introduction,  for  example,  this  secret  information  is  an  isomor¬ 
phism  between  the  graphs  on  the  input  tape.  Since  the  context  of  such  weak 
(polynomial-time)  provers  is  actually  the  context  of  most  practical  interest, 
the  type  of  security  afforded  by  zero  knowledge  protocols  in  this  context  is 
an  important  question,  and  the  subject  of  our  final  section. 

In  order  to  study  zero  knowledge  proofs  in  this  context,  we  define  the  no¬ 
tion  of  a  weak  interactive  proof  system,  a  direct  modification  of  the  definition 
of  an  interactive  proof  system  for  L.  We  define  a  weak  interactive  protocol 
to  be  an  interactive  protocol  (P,  F)  where  both  P  and  V  run  in  probabilis¬ 
tic,  polynomial-time.  We  define  a  weak  interactive  proof  system  (P,  V)  for 
a  language  L  just  as  we  defined  an  interactive  proof  system  for  L  except 
that  we  require  (P,  F)  to  be  a  weak  interactive  protocol  and  we  restrict  the 
quantification  of  P*  in  the  soundness  condition  to  be  only  over  probabilistic, 
polynomial-time  machines,  rather  than  over  all  machines.  As  the  following 
lemma  shows,  however,  weak  interactive  proofs  of  language  membership  are 
not  very  interesting. 

Lemma  5.8:  There  is  a  weak  interactive  proof  system  for  L  iff  L  is  in  BPP. 

Thus,  an  interesting  weak  interactive  proof  cannot  be  simply  a  proof 
of  language  membership;  it  must  reveal  something  about  the  prover’s  local 
state,  and  hence  must  reveal  something  about  the  prover’s  knowledge  since 
the  prover’s  knowledge  is  determined  by  its  local  state.  Consider  again  the 
zero  knowledge  proof  of  graph  isomorphism  from  [GMW86]  discussed  in  the 
introduction,  or  the  zero  knowledge  proof  of  three- colorabiUty  also  given  in 
[GMW86].  Both  proofs  can  be  carried  out  by  a  weak  prover  with  the  appro¬ 
priate  information  on  its  work  tape,  and  in  both  cases  the  verifier  obtains 
some  information  about  the  prover’s  knowledge  as  well  as  about  language 
membership.  In  the  case  of  graph  isomorphism,  the  verifier  learns  that  with 
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high  probability  the  prover  can  generate  an  isomorphism  between  the  graphs 
in  question.  Similarly,  in  the  case  of  three-colorability,  the  verifier  learns  that 
with  high  probability  the  prover  can  generate  a  three  coloring  of  the  graph  in 
question.  It  is  well-known  (see  [HM84,  MDH86])  that  information  about  the 
provar’s  knowledge  can  dramatically  affect  the  verifier’s  knowledge  about  the 
world.  For  example,  in  the  case  of  three-colorability,  information  about  the 
prover’s  knowledge  may  indicate  to  the  verifier  that  the  prover  has  with  high 
probability  communicated  with  the  entity  that  generated  the  three-colorable 
graph. 

In  order  to  study  proofs  of  the  prover’s  knowledge,  we  extend  the  defini¬ 
tion  of  a  weak  interactive  proof  of  language  membership  to  that  of  a  weak 
interactive  proof  about  the  prover’s  initial  state,  where  a  fact  is  a  fact  about 
the  prover’s  initial  state  if  it  depends  only  on  the  prover’s  initial  state  as 
defined  in  Chapter  2.  Since  the  prover’s  initial  state  is  determined  by  its 
protocol  P*,  its  initial  work  tape  5,  and  the  common  input  x,  it  is  conve¬ 
nient  to  think  of  these  components  as  parameters  and  denote  facts  about 
the  prover’s  initial  state  by  (p{P*,x,s).  The  definition  of  a  weak  interactive 
proof  of  (p{P*,x,a)  is  obtained  simply  by  replacing  all  occurrences  of  x  6  i 
by  v)(P*,  a)  in  the  definition  of  a  weak  interactive  proof  of  language  mem¬ 
bership.  Formally,  we  define  a  weak  interactive  proof  system  for  a  fact  <p 
about  the  prover’s  initial  state  to  be  a  weak  interactive  protocol  (P,  V)  such 
that 

•  Completeness:  For  every  k  and  sufficiently  large  x,  and  for  every  s 
and  t,  if  <p{P,  x,  s)  then 

pr[(P(s),  V{t)){x)  accepts]  >  1  —  lx|~*  . 

•  Soundness:  For  every  k  and  sufficiently  large  x,  for  every  probabilistic, 
polynomial-time  P*,  and  for  every  s  and  t,  if  -'y?(P*,x,a)  then 

pr[(P*(a),  F(<))(x)  accepts]  <  |x|“* . 

The  reader  may  wonder  why  we  consider  weak  interactive  proofs  of  facts 
about  the  prover’s  initial  state  that  depend  on  the  prover’s  protocol  as  well 
as  its  work  tape.  To  see  why,  suppose  y?(x,  a)  is  a  fact  about  the  prover’s 
work  tape  and  the  common  input;  that  is,  the  truth  of  <p{x^  s)  depends  only 
on  the  prover’s  work  tape  a  and  the  common  input  x  (and  not  on  the  prover’s 
protocol).  Let  us  define  dom{(p)  to  be  the  set  {x  :  <p{x,3)  for  some  a}. 
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Lemma  6.9;  A  weak  interactive  protocol  (P,  V)  is  a  weak  interactive  proof 
system  for  a  fact  tp  about  the  prover’s  work  tape  and  the  common  input  iff 

1.  for  all  sufficiently  large  *  and  for  all  a,  we  have  tp(x,  a)  ilf  *  G  dom{(p)] 
and 

2.  dom{(p)  is  in  BPP. 

This  lemma  says  that  if  there  is  a  weak  interactive  proof  of  a  fact  R  about 
the  prover’s  work  tape  and  the  common  input,  then  R  is  essentially  uninter¬ 
esting.  In  particular,  with  the  exception  of  a  few  small  values  of  x,  y>(x,  a) 
holds  for  all  a  whenever  it  holds  for  any  a.  Consequently,  R  is  essentially 
determined  by  dom{<p).  Since  dom{ip)  is  in  BPP,  the  prover  can  determine 
whether  R  holds  (for  sufficiently  large  x)  without  even  interacting  with  the 
prover.  Consequently,  a  fact  R  about  the  prover’s  initial  state  having  only 
nontrivial  weak  interactive  proofs  must  necessarily  be  a  fact  depending  on 
the  prover’s  protocol,  and  hence  on  the  prover’s  entire  initial  state.  Since  the 
prover’s  knowledge  is  determined  by  its  local  state,  such  a  weak  interactive 
proof  may  be  viewed  as  a  proof  of  the  prover’s  knowledge.  In  fact,  we  note 
that  even  in  the  context  of  infinitely  powerful  provers  an  interactive  proof  of 
X  €  L  is  not  just  a  proof  of  x  &  L  but  a  proof  the  prover  knows  *  6  L  (i.e., 
a  proof  of  the  prover’s  knowledge).  The  fact  that  all  interesting  interactive 
proofs  must  be  proofs  of  the  prover’s  knowledge  is  obscured  in  the  context 
oi  infinitely  powerful  provers  since  x  €  L  holds  iff  the  prover  knows  x  €  L. 
In  the  context  of  weak  prover,  however,  these  fact,  are  not  equivalent. 

We  have  defined  a  natural  notion  of  interactive  proof  in  the  context  of 
weak  provers,  and  we  have  shown  that  the  only  nontrivial  interactive  proofs 
in  this  context  are  proofs  about  the  prover’s  knowledge.  While  our  definition 
is  a  direct  modification  of  the  definition  in  the  case  of  strong  provers,  it 
is  not  initially  clear  that  our  definition  is  the  most  appropriate  (or  at  all 
appropriate)  in  the  context  of  weak  provers,  it  is  possible  that  our  results 
are  merely  artifacts  of  our  definition.  As  evidence  supporting  our  definition, 
we  now  show  that,  under  certain  natural  conditions,  both  interactive  proof 
systems  involving  weak  provers  that  have  appeared  in  the  literature  [FFS87, 
TW87]  are  instances  of  weak  interactive  proofs.  Not  surprisingly,  in  light 
of  our  previous  results,  these  proof  systems  concern  proofs  of  the  prover’s 
knowledge. 
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In  [TW87]  we  find  the  following  definition  (modified  slightly  for  the  sake 
of  consistency  with  the  rest  of  this  chapter).  Given  a  binary  relation  R,  a 
weak  interactive  protocol  (P,  V)  is  said  to  be  an  interactive  proof  that  the 
prover  can  generate  some  y  satisfying  R{Xfy)  if  the  following  conditions  ate 
satisfied; 

•  Completeness:  For  every  k>l  and  sufficiently  large  z  and  for  every  s 
and  t,  if  R{x,s),  then 

pr[(P(s),  V(<))(®)  accepts]  >  1  -  I®]"* . 

•  Soundness:  For  every  probabilistic,  polynomial- time  P*  there  is  a  prob¬ 
abilistic  Turing  machine  Mp.  running  in  time  polynomial  in  |x|  such 
that  for  every  k>l  and  sufficiently  large  x  and  for  all  s  and  t, 

pr\y  accepts  at  (r,m)  D  i2(x,  Mp.(rp(m)))]  >  1  —  jx]”* 

where  the  probability  is  taken  over  the  runs  r  of  (P*(s),y(t))(x)  and 
the  coin  flips  of  Mp*!’ 

While  we  would  like  to  show  that  every  interactive  proof  that  the  prover 
can  generate  some  y  satisfying  i2(x,  v)  is  a  weak  interactive  proof,  this  is  not 
quite  true.  To  see  this,  notice  that  the  definition  of  a  weak  interactive  proof 
requires  that  the  probability  with  which  {P{s),V{t)){x)  accepts  is  very  close 
to  0  when  R{x,3)  fails  to  hold,  while  an  interactive  proof  of  [TW87]  allows 
the  probability  with  which  {P{s),  F(t))(x)  accepts  to  be  arbitrary  as  long  as 
the  prover  P  is  able  to  generate  a  y  satisfying  R{x,y).  For  example,  if  P  is 
able  to  generate  a  y  satisfying  R{x,y)  with  probability  1  at  all  points  of  the 
system,  then  pr\y  accepts  at  (r,tn)  D  R{x^Mp*{Tp{rn)))]  ==  1  regardless  of 
the  probability  with  wliich  the  verifier  accepts.  We  will  prove  below,  however, 
that  the  following  is  a  necessary  and  sufficient  condition  for  an  interactive 
proof  of  [TW87]  to  be  a  weak  interactive  proof: 

•  Correctness:  For  every  k>  1  and  sufficiently  large  x  and  for  every  s 
and  t,  if  R{x,  s)  does  not  hold,  then  pr[(P(s),  V’(f))(x)  accepts]  <  jx]"*. 

'^We  note  that  the  soundness  condition  in  [TW87]  actually  quantifies  over  all  Turing 
machines  P*  and  not  just  over  polynomial-time  P*.  This  is  done  for  technical  complexity- 
theoretic  reasons.  Since,  however,  the  motivation  for  considering  weak  proven  is  that 
in  practice  all  agents  are  restricted  to  polynomial-time,  our  restriction  does  not  seem 
unnatural. 
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Intuitively,  the  good  prover  “tries”  to  convince  the  verifier  to  accept  only 
when  R{Zi  s)  holds.  It  is  easy  to  show  that,  given  an  interactive  proof  of 
[TW87j,  this  interactive  proof  can  be  modified  to  satisfy  the  correctness 
condition  iff  R{z,y)  is  testable  in  BPP:  the  modification  simply  has  the 
prover  run  the  BPP  test  in  order  to  determine  whether  is  should  attempt 
to  convince  the  verifier  to  accept.  Since  this  seems  to  be  the  most  relevant 
context  in  practice  (the  relations  used  in  the  examples  in  [TW87]  are  testable 
in  BPP,  and  fFFS87]  explicitly  restricts  to  determinisiic  polynomial-time 
relations^),  this  seems  to  imply  that  the  correctness  condition  is  a  natural 
restriction.  In  the  following  proposition  we  show  that  (P,  V)  is  an  interactive 
proof  of  {TW87]  for  a  relation  R  satisfying  the  correctness  condition  iff  it  is 
a  weak  interactive  proof  of  the  fact  tpn  defined  by 

^  (P-  =  PhR(x,>))M 

(P*  ^  P  h  Hke  soundness  condition  holds  for  P**) 

Note  that  ipR  depends  on  the  prover’s  protocol  as  well  as  the  work  tape,  and 
is  a  fact  about  the-  prover’s  initial  state.  Of  course,  ipR  is  not  necessarily 
testable  in  BPP. 


Proposition  5.10:  (P,  V)  is  an  interactive  proof  satisfying  the  correctness 
condition  that  the  prover  can  generate  a  y  such  that  R{x,  y)  iff  (P,  V)  is  a 
weak  interactive  proof  system  for  (pR. 

We  can  show,  in  addition,  that  the  proof  systems  of  [FFS87]  satisfying 
the  correctness  condition  above  are  also  instances  of  a  weak  interactive  proof 
system.  The  following  is  an  interpretation  of  the  quite  informal  definition  of 
an  interactive  proof  given  in  [FFS87]: 

•  Completeness'.  For  every  Jb  >  1  and  sufficiently  large  x  and  for  every  s 
and  t,  if  R{x,  s),  then 

pr[(P(s),  V’(t))(a:)  accepts]  >  1  —  |®|“* . 

^[Slo89]  shows  that  certain  anomalies  in  the  definition  of  an  interactive  proof  in  [FFS87] 
disappear  when  the  deterministic  restriction  is  removed. 
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*  Soundness:  For  every  k>l  there  exists  a  probabilistic  Turing  machine 
Mk  such  that  for  every  P*  and  /  >  1  and  sufSciently  large  x,  and  all  a 
and  t, 

pr[(P'‘,  V)  accepts]  >  Ixl”** 

implies 

pr{P(»,ii4(P%x))]  >  1  -  1x1"^ . 

Here  Mk  is  given  the  “code”  for  P*‘  and  is  allowed  to  run  in  time 
polynomial  in  x,  the  running  time  of  P*,  and  the  length  of  the  “code” 
for  P\ 

It  is  not  hard  to  show  that  such  an  interactive  proof  is  also  an  interactive 
proof  of  a  fact  similar  to  (pR.  We  leave  the  proof  to  the  reader. 

In  light  of  the  preceding  propositions,  our  definition  of  a  weak  interactive 
proof  system  seems  to  be  an  appropriate  definition;  it  can  at  least  capture 
the  definitions  of  other  proof  systems  dddned  in  the  context  of  polynomial¬ 
time  provers.  We  now  turn  to  the  study  of  the  security  afforded  by  such 
protocols.  Our  definition  of  a.  weak  interactive  proof  is  a  direct  modification 
of  the  definition  of  an  interactive  proof  of  language  membership.  We  can 
also  directly  modify  the  definition  of  a  zero  knowledge  proof  of  language 
membership  to  obtain  a  definition  of  a  zero  knowledge  weak  interactive  proof: 
a  weak  interactive  proof  (P,  V)  is  said  to  be  zero  knowledge  if  for  every  V* 
there  exists  a  Turing  machine  My  such  that  the  families 

{(P(»),  V(I))M  =  t,  z)  S  Dom} 


and 

'  (P»«>  F’*»t,x)  €  Dom} 

are  polynomially  indistinguishable,  where  (P,s,  V’*,t,x)  €  Dom  iff  V*  is  a 
possible  verifier  protocol,  s  and  t  are  possible  work  tapes,  and  V7(P,s,x). 

Not  surprisingly,  analogues  of  all  our  previous  results  for  interactive  proo& 
hold  in  the  case  of  weak  interactive  proofs,  with  essentially  the  same  proofs. 
Rather  than  restating  all  the  results  here,  we  focus  on  one  of  them,  the 
analogue  of  Proposition  5.1.  If  ^  is  a  fact  about  the  prover’s  initial  state, 
then  we  say  (r.m)  1=  tp  if  ^(P*,x,s),  where  P*  is  the  protocol  that  p  is 
running  in  r,  x  is  the  common  input  in  the  initial  state  r(d),  and  s  is  the 
contents  of  p’s  work  tape  in  r(0). 
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Proposition  5.11:  A  weak  interactive  pr  ^tocol  (P,  V)  is  a  weak  interactive 
proof  system  for  a  fact  tp  about  the  prover’s  initial  state  iff  the  following 
conditions  are  satisfied: 

•  Completeness'.  For  every  k  there  exists  a  such  that 

P  X  F  i=  inti  D  Pr[v>  D  Oacccpi]  >  1  —  a  I®!”* 

•  Soundness:  For  every  k  there  exists  a  such  that 

X  F  [=  init  D  PrfOocccpt  D  ^]  >  1  —  a  j®!”*' . 

Thus,  we  have  replaced  the  occurrences  of  ®  €  i  in  Proposition  5.1  by  <p,  and 
used  rather  than  V  in  the  soundness  condition  since  we  are  restricting 
to  weak  provers. 

At  this  point,  we  can  make  an  interesting  observation  about  the  definition 
of  interactive  proof  systems.  Notice  that  in  our  soundness  condition,  the 
meaning  of  “sufficiently  large  ®”  (that  is,  the  value  of  Nk)  depends  only 
on  the  value  of  k  and  not  on  the  choice  of  P*.  In  early  versions  of  the 
definition  of  an  interactive  proof  given  in  [GMR89],  it  is  not  clear  whether 
the  dependence  is  on  k  alone  or  on  both  h  and  P*.  But  as  Shaft  Goldwejsser 
pointed  out  to  us,  in  the  case  of  infinitely  powerful  provers,  it  doesn’t  matter 
what  choice  we  make.  More  formally,  in  the  context  of  language  recognition, 
an  interactive  proof  system  (P,  F)  is  sound  with  respect  to  one  choice  iff 
it  is  sound  with  respect  to  the  other.  The  proof  of  this  observation  is  a 
consequence  of  Feldman’s  proof  technique  for  proving  that  it  is  suftHcient 
to  assume  the  prover’s  computational  powers  are  limited  to  PSPACE  [Fel]: 
we  can  construct  a  cheating  PSPACE  prover  that,  at  any  point  during  a 
conversation  with  the  verifier  F,  can  try  all  possible  answers  to  the  verifier’s 
latest  question,  compute  which  answer  vrill  cause  the  verifier  to  accept  with 
the  greatest  probability,  and  send  this  answer  to  the  verifier. 

In  the  case  of  weak  provers,  however,  the  order  of  quantification  in  the 
statement  of  soundness  is  important.  In  particular,  if  we  had  stated  our 
soundness  condition  so  that  the  choice  of  “sufficiently  large  ®”  might  depend 
on  the  protocol  P*,  all  we  would  be  able  to  prove  is  that  for  every  k  and 
every  protocol  P*,  there  exists  a  such  that 

P*  X  F  [=  init  D  Pr\Oaccept  D  —  oc  1®!“* . 
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Instead,  we  can  prove  that  for  every  k  there  exists  an  a  such  that 

X  V  tnifi  D  Pri'y accept  D  ^]  >  1  —  a  Ixl”* . 

The  first  statement  says  that,  for  every  prover,  as  long  as  the  verifier  knows 
the  identity  of  the  prover,  (p  is  true  whenever  the  verifier  accepts.  The  second 
statement,  on  the  other  hand,  says  that  no  matter  who  the  prover  is,  (p  is 
true  whenever  the  verifier  accepts,  which  is  clearly  the  desired  statement. 
We  remark  that  the  weak  interactive  protocols  resulting  from  tha  interactive 
proofs  and  zero  knowledge  proofs  we  are  aware  of  satisfy  the  t trouper  notion 
of  soundness  we  have  used  in  our  definition,  and  the  revised  definition  of  a."*, 
interactive  proof  appearing  in  (GMR89]  is  consistent  witl<  definition  w«= 
use. 

In  addition  to  proving  the  analogues  of  results  holdiA’  *n  fbe  to.ifext  of 
strong  provers,  we  can  reason  about  the  interactive  proofs  of  'FI'  *?/,  TVv'd?] 
directly  in  terms  of  the  notions  of  knowledge  and  generation  Wo  ]:a»e  I'liiined 
in  previous  sections.  For  example,  we  can  characterize  proofs  that  the  prover 
can  ge;:.3rate  some  y  satisfying  i2(x,  y)  just  as  we  characterized  interactive 
proofs,  in  the  case  that  i2(®,y)  is  testable  in  BPP. 

Proposition  5.12:  Given  a  relation  R(x,y)  testable  in  BPP,  a  weak  inter¬ 
active  protocol  (P,  y)  is  a  weak  interactive  proof  that  the  prover  can  generate 
some  y  satisfying  R{x,  y)  iff  the  following  conditions  are  satisfied: 

•  Completeness:  For  every  k  there  exists  a  such  that 

.P  X  y  1=  init  D  Pr[i2(®,s)  D  Oaccept]  >  1  —  a 

•  Soundness:  For  every  probabilistic,  polynomial-time  P*, 

P*  X  y  1=  accept  D  Gfy.R{x,y) 

where  V*  is  the  fact  halted  that  the  verifier  has  halted. 

Notice  that  in  the  soundness  condition,  we  have  accept  D  G^y.R{x,y)  rather 
than  O accept  D  G^y.R{x,y).  The  first  condition  says  that  the  prover  can 
generate  some  y  such  that  R{x.y)  at  the  point  when  the  verifier  accepts,  as 
required  by  [TW87],  and  not  at  the  initial  point  as  would  be  the  case  with 
the  second  clause.  This  is  one  of  the  differences  between  the  definitions  of 
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[TW87]  and  [FFS87].  A  second  difference  between  the  two  definitions  is  that 
the  soundness  condition  of  [FFS87]  is  such  that  we  can  state  the  soundness 
condition  above  in  terms  of  the  system  x  V  instead  of  P*  X  V.  We 
remark  that  because  the  machine  M{P*t  x)  guaranteed  by  the  definition  of  an 
interactive  proof  in  [FFS87]  runs  in  time  polynomial  in  |x|,  the  running  time 
of  P*,  and  the  length  of  the  encoding  of  P*,  we  must  modify  the  definition 
of  Gfy.R{x,y)  to  say  that  the  generating  Turing  machine  also  runs  in  these 
parameters  in  order  to  reason  about  this  definition  of  an  interactive  proof. 
This  modification  is  the  same  modification  needed  to  reason  about  notions 
of  aero  knowledge  other  than  strong  black-box  zero  knowledge. 


5.8  An  Application 

In  preceding  sections  we  have  characterized  interactive  proof  systems  in  terms 
of  knowledge.  As  an  example  of  how  to  reason  about  interactive  proof  sys¬ 
tems  in  terms  of  knowledge,  we  show  how  to  prove  the  familiar  result  that 
the  sequential  composition  of  an  interactive  proof  of  x  €  L  followed  by  an 
interactive  proof  of  x'  €  is  an  interactive  proof  of  (x,x')  G  L  x  U. 

For  expository  simplicity,  we  have  been  studying  interactive  protocols 
(P,  V)  in  isolation.  However,  as  shown  by  the  coin  flipping  example  in  the 
introduction  motivating  interest  in  zero  knowledge  in  the  first  place,  inter¬ 
active  protocols  are  not  used  in  isolation.  They  are  intended  to  be  used  as 
subroutines  or  building  blocks  in  the  construction  of  other  protocols.  Pro¬ 
viding  a  general  definition  of  what  it  means  for  one  protocol  to  be  used  as  a 
subroutine  in  another  protocol  is  a  difficult  problem.  It  is  not  too  difficult, 
however,  to  define  the  sequential  composition  of  two  protocols. 

Loosely  speaking,  if  P  and  Q  are  two  protocols,  their  sequential  compo¬ 
sition  P;  Q  should  correspond  to  first  running  the  protocol  P  until  it  halts 
(if  ever)  and  then  running  the  protocol  Q.  Recall  that  a  protocol  is  actually 
a  tuple  of  local  protocols,  one  for  each  agent  in  the  system,  and  that  a  local 
protocol  consists  of  state,  message,  and  action  protocols.  We  will  define  the 
composition  of  two  message  protocols  A  and  B.  The  composition  of  message 
and  action  protocols  is  similar,  and  the  composition  of  local  protocols  and 
protocols  will  immediately  follow. 

We  can  assume  without  loss  of  generality  that  the  domains  dom{A)  and 
dom{B)  of  A  and  B  (that  is,  the  sets  of  local  states  on  which  the  functions 
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A  luid  B  are  defined)  are  disjoint.  Let  halt{A)  and  start{B)  be  the  halt 
states  of  A  and  start  states  of  5,  respectively.  The  only  real  problem  in  the 
definition  of  A;  B  is  how  the  composition  should  move  from  a  halt  state  of 
i4  to  a  start  state  of  B.  In  the  case  of  mleractive  protocols,  for  example,  it 
seems  most  natural  to  require  that  the  states  of  the  communication  tapes, 
work  tapes,  and  random  tapes  encoded  in  a  local  state  remain  the  same, 
and  that  the  only  thing  that  changes  is  that  the  state  of  the  Turing  machine 
describing  the  prover  or  verifier’s  protocol  changes  from  a  halt  state  of  the 
first  protocol  to  the  start  state  of  the  second.  This  can  be  described  by  a 
function  /  from  halt{A)  to  start{B).  The  sequential  composition  A\  B  oi  A 
and  B,  given  /,  is  defined  by 

i4(s,  m)  if  s  6  dom{A)  —  halt{A) 

A‘,B{s,m)  =  <  f{s)  if  s  6  halt{A) 

B(s,m)  if  5  €  dom{B) 

(Remember  that  a  state  protocol  A  maps  a  locd  state  s  and  a  vector  m  of 
messages  received  from  other  protocols  to  a  local  state  A{a,m).) 

Having  defined  sequential  composition,  we  now  show  that  the  sequential 
composition  of  two  interactive  proofs  is  an  interactive  proof.  Suppose  (Pi,  l^i) 
and  (P2,  Vj)  are  interactive  proofs  for  Li  and  L2,  respectively.  Recall  that 
we  assume  the  prover  and  verifier  maintain  on  their  work  tapes  a  complete 
history  of  the  local  states  they  pass  through  during  the  course  of  a  run. 
Notice  that  a  trivial  modification  of  these  proof  systems  results  in  proof 

A  A 

systems  for  the  languages  L\  =  L\  X.  S*  and  L2  =  S*  x  Z12,  respectively, 
where  S  =  {0, 1}.  Let  us  abuse  notation  and  denote  these  new  proof  systems 
by  (PijVi)  and  (P2,V2)  as  well.  Finally,  let  (P,  F)  =  (Pii  P2,  VI;  V2)  be  the 
sequential  composition  of  the  two  proof  systems.  We  now  sketch  a  proof  that 
(P,  V)  is  an  interactive  proof  system  for  L  =  L\X  L^. 

First,  we  note  that  it  is  easy  to  prove  the  following: 

Claim  5.13: 

P  X  V  [=  (®  €  Li  A  *p  running  P’)  D  OK^{x  €  Li) 

def  ^ 

where  tj)  =  halted  A  ‘p  running  P\ 

To  see  this,  notice  that  since  (Pi,  Vi)  is  an  interactive  proof  for  Li,  Proposi¬ 
tion  5.3  says 

T  xVi  [=  (x  €  Li  A  *p  running  Pi)  D  Okp{x  €  Li) 
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where  ^*1  '=  halted  A  ‘p  running  Pi.  It  is  clear  that  any  test  M  in  V  x  Vi 
for  X  6  Li  that  is  practically  sound  and  practically  complete  given  can  be 

A  A 

extended  to  a  test  M  in  V  y.V  that  is  sound  and  practically  complete  given 
V*:  the  test  M  simply  searches  its  work  tape  for  the  most  recent  local  state 
in  which  the  verifier  was  running  Vi,  runs  M  in  this  state,  and  accepts  iff  M 
accepts. 

It  is  a  bit  harder  to  prove  that 
Claim  5.14: 


7?  X  F  [=  (a:  €  i/2  A  ‘p  running  P’)  D  OK^{x  6  L2), 
where  ^  halted  A  ‘p  running  P\ 

To  prove  this,  we  observe  that  since  (P2,V2)  is  an  interactive  proof  for 
Proposition  5.3  says 

P  X  V2  h  (®  €  i(2  A  ‘p  running  P2)  D  0^^(aj  6  L2) 


where  V’s  halted  A  ‘p  running  P2*.  We  want  to  say  that  any  test  M  in 
P  X  V2  for  a;  €  ^2  that  is  sound  and  complete  given  ^2  can  be  extended 
to  a  test  M  in  P  X  y  for  ®  €  i2-  The  test  M  is  defined  as  follows.  Since 
y  =  Vi;V2,  it  is  easy  to  see  that  there  is  a  natural  mapping  h  mapping  a 
point  c  of  P  X  y  in  which  the  verifier  is  running  V2  to  a  point  d  of  P  X  V^. 

A 

This  mapping  essentially  discards  that  portion  of  a  run  of  P  x  y  up  to  the 
point  V2  is  started,  erasing  everything  on  the  communication  and  random 
tapes  that  is  written  before  the  beginning  of  leaving  the  input  and  work 
tapes  unchanged.  The  test  M  rejects  at  a  point  if  the  verifier  is  still  following 
Vi,  and  at  all  other  points  c  runs  the  test  M  on  the  point  h(c).  The  problem 
is  showing  that  M  is  practically  sound  and  practically  complete  given 
To  do  this,  we  have  to  relate  the  probability  spaces  used  in  P  x  y  to 
evaluate  formulas  like  pr[<p]  >  a  to  the  probability  spaces  used  in  P  X 
It  is  easy  to  see  that,  extending  h  to  sets  in  the  obvious  way,  h  maps  5,',e  to 
Si,d  (where  d  =  h{c))  and  measurable  sets  of  Si,e  to  measurable  sets  of  5',-,^ 
with  the  same  measure.  Furthermore,  the  fact  x  E  L  holds  at  c  iff  it  does  at 
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inV  X  V,  and  hence  that  init  D  pr[^]  >  a  is  valid  in  'P  x  V.  Consequently, 
the  fact  that  M  is  practically  sound  inP  xVt  implies  that  M  is  practically 
sound  iTxPxV^  and  similarly  for  practical  completeness  given  V*'  This  proves 
Claim  5.14. 

Given  the  two  Claims  5.13  and  5.14,  we  know  that  the  two  formulas 
{x  &L\f\  ‘p  running  P')  D  0.^^(x  6  Li) 
and 

(x  €  i/2  A  ‘p  running  P’)  D  Okf{x  €  L2) 

A  AAA 

are  valid  inP  xV.  Notice  that  x  €  L  implies  x  6  Lj  and  x  €  L2)  and  that, 
since  K^{x  €  Li)  and  K^{x  €  i»2)  are  stable  formulas  (once  they  become 
true  they  remain  true),  OKf(x  £  Li)  A  OK^{x  €  L2)  implies  OK^{x  €  L). 
It  follows  that 

Corollary  5.15: 

P  xV  \={x  £  L  A*p  running  P*)  D  Ok^{x  6  i), 

where  V’  —  halted  A  ‘p  running  P\ 

Finally,  by  Proposition  5.4  we  have 

Proposition  5.10;  The  interactive  protocol  (P,  V)  can  be  effectively  mod¬ 
ified  to  obtain  an  interactive  proof  for  the  language  L. 

5.9  Conclusion 

The  main  contribution  of  this  work  lies  in  suggesting  notions  of  knowledge 
appropriate  for  interactive  proofs,  characterizing  interactive  proofs  in  terms 
of  these  notions,  and  proving,  again  in  terms  of  these  notions,  that  the  prover 
in  a  zero  knowledge  proof  system  does  not  leak  any  information  other  than 
the  fact  it  set  out  to  prove.  Roughly  speaking,  we  have  shown  that  a  zero 
knowledge  proof  system  for  x  G  i  satisfies  the  following  property,  which  we 
call  knowledge  security:  the  prover  is  guaranteed  that,  with  high  probability, 
if  the  verifier  will  practically  know  a  fact  at  the  end  of  the  proof,  it  prac¬ 
tically  knows  X  £  L  Dip  aX  the  start.  We  have  also  formalized  the  notion  of 
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knowing  how  to  generate,  and  shown  that  zero  knowledge  proofs  also  satisfy 
an  analogous  property  of  generation  security.  (The  precise  formulations  of 
knowledge  and  generation  security  are  provided  by  the  statements  of  Theo¬ 
rems  5.5  and  5.7.)  It  is  currently  an  open  question  whether  either  of  these 
notions  of  security  characterizes  zero  knowledge  (that  is,  say,  whether  an 
Interactive  proof  that  satisfies  the  property  of  knowledge  security  is  also  a 
zero  knowledge  proof).  We  can  show,  however,  that,  in  the  context  of  finite 
state  protocols,  any  protocol  that  satisfies  the  knowledge  security  property 
is  recognition  zero  knowledge^  as  defined  in  [DS88].  We  consider  the  prob¬ 
lem  of  characterizing  zero  knowledge  in  terms  of  knowledge  instead  of  simply 
stating  necessary  conditions  for  zero  knowledge  (knowledge  and  generation 
security)  to  be  an  important  problem. 

We  have  sketched  in  Section  5.8  an  example  of  how  practical  knowledge 
can  be  used  to  reason  about  cryptographic  protocols  like  interactive  proof 
systems.  A  second  important  problem  left  unsolved  by  this  chapter  is  that 
of  developing  more  sophisticated  tools  for  reasoning  about  practical  knowl¬ 
edge  (and,  for  that  matter,  knowing  how  to  generate)  that  will  be  needed 
in  order  to  be  able  to  prove  more  sophisticated  results  about  cryptography 
in  terms  of  knowledge.  In  Chapter  3  we  were  able  to  use  fairly  powerful 
proof  rules  like  the  induction  rule  to  reason  about  information-theoretic  def¬ 
initions  of  knowledge,  a  rule  that  is  essentially  the  translation  of  theorems 
from  recursion  theory  into  statements  about  knowledge.  In  the  case  of  prob¬ 
abilistic  knowledge,  it  is  possible  to  translate  many  results  theorems  about 
measure  theory  into  proof  rules  for  probabilistic  knowledge  (see  [FH88]  for 
a  number  of  examples).  But  because  the  definition  of  practical  knowledge 
depends  on  Turing  machines,  powerful  proof  rules  for  reasoning  about  prac¬ 
tical  knowledge  are  going  to  require  genereil  results  about  computation  and 
computational  complexity.  Some  simple  proof  rules  such  as  '‘From  Kptp\ 
and  Kpip2  infer  A  tpj)”  are  quite  easy  to  prove  valid.  But  we 

have  seen  in  Section  5.3.2  and  the  work  of  [Mos88]  that  proof  rules  such  as 
“From  K^ip  and  D  ^')  infer  K^<p'”  are  not  necessarily  valid.  Un¬ 

der  what  conditions  are  such  rules  valid?  It  is  not  clear  at  the  moment 
how  different  reasoning  about  such  conditions  and  using  the  resulting  proof 
rules  will  be  from  making  such  inferences  by  reasoning  directly  in  terms  of 
the  operational,  cryptographic  definitions  in  the  first  place.  Moreover,  we 
W2int  to  be  able  to  reason  about  interactive  protocols  in  isolation,  and  use 
these  results  to  reason  about  protocols  making  use  of  interactive  protocols 
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as  subroutines.  This  means  that  we  want  to  be  able  to  prove  that  certain 
statements  about  knowledge  are  valid  in  a  system  corresponding  to  running 
an  interactive  protocol  in  isolation,  and  prove  that  these  same  statements  are 
true  in  another  system  at  all  points  at  which  the  interactive  proof  is  being 
run  as  a  subroutine.  But  we  do  not  seem  to  have  at  the  moment  very  so¬ 
phisticated  techniques  for  translating  statements  about  knowledge  from  one 
system  to  another,  although  the  mapping  h  used  in  Section  5.8  and  the  re¬ 
lated  notions  of  implementation  defined  by  Halpern  and  Fagin  in  [HF85]  and 
elaborated  by  Mazer  in  [Maz89]  are  a  good  initial  step  toward  this  goal. 

Nonetheless,  we  feel  that  these  security  results  shed  some  light  on  the  type 
of  security  that  zero  knowledge  proofs  provide.  Our  theorems  provide  support 
for  the  definitions  of  interactive  proofs  and  zero  knowledge  and  our  model 
provides  a  good  semantic  setting  for  such  an  analysis.  Some  of  the  definitions, 
chiefly  that  of  practical  knowledge,  are  quite  subtle.  Many  straightforward 
definitions  one  may  try  fail  by  being  inappropriate  for  the  cryptographic  set¬ 
ting  and  not  providing  a  useful  sense  in  which  zero  knowledge  proof  systems 
provide  security.  As  Feige,  Fiat,  and  Shamir  write  in  [FFS87],  “the  notion  of 
‘knowledge’  is  very  fuzzy,  and  a  priori  it  is  not  clear  what  proofs  of  knowl¬ 
edge  actually  prove.”  We  hope  to  have  established  a  framework  within  which 
such  questions  can  now  be  answered. 


5. A  Proofs  of  results 

We  end  this  chapter  with  an  appendix  in  which  we  prove  most  of  the  results 
claimed  in  this  chapter.  As  stated  in  the  text,  the  proofs  of  the  remain¬ 
ing  results  either  follow  immediately  from  preceding  results,  or  sure  virtually 
identical  to  the  proofs  of  the  preceding  results. 

Proposition  5.1:  An  interactive  protocol  (P,  V^)  is  an  interactive  proof  sys¬ 
tem  for  a  language  L  iff  the  following  conditions  are  satisfied: 

•  Completeness’.  For  every  fc  >  1  there  exists  ot  >  1  such  that 

P  xV  [=  init  D  Pr[x  &  LD  Oaccept]  >  1  —  a  |x|“* . 

•  Soundness:  For  every  k>l  there  exists  a  >  1  such  that 

V  xV  \=  init  D  Pr[Oaccept  Dx€L]>l  —  a  [xp* . 
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Proof:  First,  given  an  interactive  proof  system  (P,  V)  for  a  language  Zr,  we 
prove  that  the  two  conditions  above  are  satisfied.  Fix  k>l,  let  >  1  be 
the  constant  guaranteed  by  the  definition  of  an  interactive  proof  system,  and 
take  a  —  (iV*)*;  notice  that  1  —  a|x|”*  <  0  when  |x|  <  JV*. 

We  first  prove  that  the  completeness  condition  is  satisfied.  It  is  enough  to 
show  that  for  any  initial  point  c  of  P  X  V,  the  point  c  satisfies  the  formula 
defined  by  Pr[®  6  L  D  Oaccept]  >  1  —  ot  1®|~*.  Fix  one  such  point  c.  Notice 
that  fixing  c  implies  fixing  an  initial  global  state,  and  hence  fixing  values  for 
z,  s,  and  L,  then  all  points  with  c’s  global  state  satisfy  the  formula 

X  &  LD  Oaccept,  and  hence  c  satisfies  ^i.  Suppose  z  €  Z.  If  |x|  <  Nk,  then 
by  the  choice  of  a  we  have  1  —  a  |z|“*  <  0,  and  c  trivially  satisfies  ^i.  If 
I®  I  >  Wfc,  then  by  the  completeness  condi  ‘'■•n  for  interactive  proof  systems 
we  have  that  the  verifier  accepts  in  1  —  |i  p**  >  1  —  a  Izp**  of  the  runs  of 
(P(s),  V’(t))(z);  in  other  words,  Oaccept  holds  at  1  —  a|z|“*  of  the  points 
with  c’s  global  state,  and  c  satisfies  if)i. 

We  now  show  the  soundness  condition  is  satisfied:  Again,  it  is  enough  to 
show  that  for  any  initial  point  c  of  P  x  F,  the  point  c  satisfies  the  formula 

defined  by  Pr[Oaccept  Dz€Z]  >  1  —  a  |z|”*.  Fix  one  such  point  c. 
Again,  notice  that  fixing  c  implies  fixing  an  imtial  global  state,  and  hence 
fixing  values  for  P*,  z,  a,  and  t.  If  z  €  L,  then  all  points  with  c’s  global 
state  satisfy  the  formula  Oaccept  D  x  £  L,  and  hence  c  satisfies  V's*  Suppose 
X  ^  L.  If  |z|  <  Nk,  then  by  the  choice  of  a  we  have  1  —  a  |z|~*  <  0,  and  c 
trivially  satisfies  V's*  If  I®!  ^  Nk,  by  the  soundness  condition  for  interactive 
proof  systems  it  follows  that  the  verifier  accepts  in  at  most  |z|~^  of  the  runs 
of  P*  emd  V  on  input  z  with  work  tapes  a  and  t.  This  means  that  at  least 
1  —  |z|“*  >  1  —  a|®r*  of  the  points  with  c’s  global  state  fail  to  satisfy 
Oaccept,  and  hence  must  satisfy  Oaccept  D  x  £  L.  It  follows  that  c  satisfies 

V’2- 

Conversely,  given  (P,  F)  satisfying  the  two  conditions  above,  we  prove 
(P,  F)  is  an  interactive  proof  system  for  L.  Fix  ^  >  1,  let  a  >  1  be  the 
constant  guaranteed  by  the  two  conditions  above  for  2k,  and  take  ATj^  >  1  to 
be  large  enough  that  a  <  (W*)*;  notice  that  a  <  |z|*  when  |z|  >  Nk. 

We  first  show  the  completeness  condition  for  an  interactive  proof  system 
is  satisfied.  Consider  any  z,  a,  and  t  satisfying  x  £  L  and  |z|  >  Nk.  The 
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have  Pr[(P(s),  y(<))(a)  accepts]  >  1  —  I®]”*. 

We  now  prove  the  soundness  condition  for  an  interactive  proof  system  is 
satisfied.  Consider  any  P*,  ®,  s,  and  t  satisfying  x  ^  L  and  |®|  >  Nk-  Since 
X  ^  L,  the  soundness  condition  above  guarantees  that  the  verifier  fails  to 
accept  in  at  least  1  —  a  j®  of  the  runs  with  P*  and  V  on  input  ®  with  work 
tapes  s  and  t,  which  means  the  verifier  accepts  in  at  most  a|®|“**  <  1®|“* 
runs,  so  Pr[(P*(a),  V’(t))(®)  accepts]  <  j®!”*.  □ 

Proposition  5.3:  If  (P,  V)  is  an  interactive  proof  system  for  L,  then 

V  xV  \=  {x  e  L  A‘p  running  P*)  D  Ok*{x  G  L), 

where  if)  *=  halted  A  ‘p  running  P\ 

Proof:  Let  M  be  the  test  that  accepts  at  a  point  if  the  verifier  has  accepted 
at  that  point,  and  rejects  otherwise.  Suppose  we  can  show  that  M  is  practi> 
cally  sound  for  K„{x  6  L),  and  practically  complete  for  K„{x  6  L)  given  "ip. 
Then  we  can  complete  the  proof  of  this  proposition  as  follows.  Consider  any 
point  (r,  k)o{VxV  satisfying  ®  €  i A ‘p  running  P\  and  consider  any  final 
point  (r,  k')  of  r  with  k'  >  k.  Notice  that  (r,  k')  |=  and  (r,  k)  |=  Ky{x  G  L). 
Since  M  is  &  test  for  Ky{x  G  L)  that  is  sound  and  is  complete  given  "tp,  we 
have  (r,  k')  |=  ^^(®  G  L),  and  hence  (r,  k)  ]=  Ok^{x  G  L),  It  follows  that 

P  X  V  1=  (®  G  L  A  ‘p  running  P’)  D  Ok^{x  G  L), 

as  desired.  Thus,  all  we  need  to  prove  is  that  M  is  practically  sound  for 
K„{x  G  L),  and  practically  complete  for  Kv{x  G  L)  given  “0.  Since  Kv{x  G  L) 
is  equivalent  to  ®  G  L,  it  is  enough  to  prove  that  M  is  practically  sound  for 
a;  G  L,  and  practically  complete  for  ®  G  L  given  ip. 

To  see  that  M  is  practically  sound  for  ®  G  L,  fix  fc  >  1  amd  take  a  >  1 
to  be  the  constant  guaranteed  by  Proposition  5.1  to  satisfy 

P  X  V  1=  init  D  Pr[Ooccep<  D®G2/]>1  —  a  |®1~* . 

Notice  that  the  formula  Oaccept  D  x  E  L  implies  x  ^  L  D  -^accept,  which 
in  turn  implies  sound{M,  x  G  L).  Since  Oaccept  D  ®  G  L  is  a  fact  about  the 
run,  Oaccept  D  x  E  L  implies  □(Oaccept  D  ®  G  L),  which  in  turn  implies 
^3ound{M,  X  E  L).  It  follows  that 

P  X  y  1=  init  D  Pr[Osound{M, ®  G  L)]  >  1  —  a  |®1“* , 
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and  hence  M  is  sound  for  x€  L. 

To  see  that  M  is  practically  complete  for  x  €  L  given  fix  A:  >  1  and 
take  ot  >  1  be  the  constant  guaranteed  by  Proposition  5.1  to  satisfy 

P  xV  \=  inti  D  Pr[x  €  i  3  Oaccept]  >  1  —  a  |x|“* . 

Notice  that  the  formula  x  G  L  D  Oaccept  implies  if)  D  {x  E  L  D  A 
Oaccept)).  Since  the  formula  if)  A  Oaccept  is  equivalent  to  accept  (the  ver¬ 
ifier  has  already  accepted  or  rejected  at  points  satisfying  if),  namely  final 
points),  and  since  x  E  L  D  accept  implies  complete{M,x  E  L),  we  have 
if)  D  complete{M,  x  E  L).  Finally,  since  x  E  L  D  Oaccept  is  a  fact  about 
the  run,  x  E  L  D  Oaccept  implies  Of®  E  L  D  Oaccept]^  which  implies 
0[if)  D  compIete{M,x  E  L)].  It  follows  that 

P  X  ^  init  D  Pr[0[if)  D  complete{M,  x  E  Zr)]]  >  1  —  a  |*|~* , 

But  we  want  to  prove  that  this  formula  is  valid  in  the  system  V  x  V,  and 
not  P  xV,  Since  a  point  oiV  xV  satisfying  V>  is  a  point  of  P  x  V  (recall 
that  if)  D^p  running  P’),  we  have 

P  X  V  1=  init  D  Pr[0[if)  D  complete{M,  x  E  Z)]]  >  1  —  a  |*|~* , 

as  desired,  and  hence  M  is  complete  iot  x  E  L  given  if).  □ 


Proposition  5.4;  If 

P  X  V*  1=  (®  €  Z  A  *p  running  P’)  D  Ok^{x  E  Z), 

where  if)  *=  halted  A  ‘p  running  P\  then  we  can  effectively  modify  V*  to 
obtain  V  such  that  (P,  V)  is  an  interactive  proof  system  for  Z. 


Proof:  Let  Af  be  a  test  for  Kv{x  E  Z),  and  hence  for  x  E  L,  that  is  practi¬ 
cally  sound,  and  practically  complete  given  if).  Such  a  test  M  is  guaranteed 
to  exist  by  the  definition  of  practical  knowledge  given  if).  We  assume  without 
loss  of  generality  that  M  accepts  with  probabilities  and  1  —  2"l*l  instead 
of  1/3  and  2/3.®  Let  V  be  the  protocol  in  which  the  verifier  (i)  runs  the 


^Wc  can  always  tiansfoitil  a  icSl  m  accepting  with  piobabiliiies  i/3  and  2/3  into  a  test 
M'  accepting  with  probabilities  and  1  —  by  using  the  standard  trick  of  running 
the  test  M  many  times  to  estimate  the  probability  with  which  M  accepts  or  rejects. 
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protocol  y*,  (ii)  runs  the  test  M  once  V*  halts,  and  (iii)  accepts  ilF  M  ac¬ 
cepts.  We  now  show  that  (P,  V)  satisfies  the  soundness  and  completeness  of 
Proposition  5.1,  and  hence  must  be  an  interactive  proof  system  for  L.  Given 
a  run  r  olV  xV  and  a  run  r*  of  P  x  F*,  we  say  that  r  and  r*  are  corre- 
sponding  runs  if  the  two  runs  have  the  same  initial  state,  and  the  sequences 
of  coins  flipped  in  the  two  rims  are  the  same.  We  say  that  (r,A)  and  {T*,k) 
are  corresponding  points. 

We  first  prove  that  (P,  V)  satisfies  the  soundness  condition 

P  X  y  {=  init  D  Pr(Oacccpt  D  x  G  L]  >  1  —  a  |®|“* . 

of  Proposition  5.1.  Since  M  is  practically  sound  for  x  €  L  in  P  x  V",  we 
have 

P  X  y*  j=  init  D  Pr{Osound{M,x  €  L))  >  1  —  a  Ixp*  . 

Recall  that  sov,nd{M,  x  (z  L)  holds  at  a  point  if  at  that  point  x  ^  L  implies 
pr[M  rejects]  >  1  —  Remember  that  the  probability  here  is  being  taken 
over  M’s  coin  flips  (and  not  over  runs),  and  that  this  condition  is  a  fact  about 
the  global  state  (even  a  fact  about  the  verifier’s  local  state,  the  input  to  the 
test  M).  If  we  take  this  condition  as  a  primitive  proposition  in  our  language, 
then  sound{M,x  G  L)  is  equivalent  to  the  formula  x^  LD  pr[M  rejects]  > 
1  —  2~l*i.  It  follows  that 


Osound{M,  X  G  i) 


implies 

□(x  ^  LD  pr[M  rejects]  >  1  —  2"l"l). 

We  claim  that,  given  corresponding  runs  r  and  r*  of  P  x  y  and  P  X  y*, 
if  the  initial  point  (r*,0)  satisfies  Osound{MyX  G  L)  and  hence  satisfies 

□(x  0  If  D  pr[M  rejects]  >  1  —  2“'*'), 

then  the  initial  point  (r,  0)  satisfies 

X  ^  LD  0{Pr[Oreject]  >  1  —  2“*"'). 

To  see  this,  let  I  be  the  time  at  which  the  verifier  has  finished  the  protocol 
y*  in  r  and  r*  and  starts  the  test  M  in  r.  If  (r*,0)  \=x  ^  L,  then  (t**,^)  \= 
pr[M  rejects]  >  1  —  2~I*L  Consequently,  if  (r,0)  |=  x  i,  then  (r,/)  |= 
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Pr[Oreject]  >  1  —  (Remember  that  the  probability  is  being  taken  oyer 
M’s  coin  flips  at  (r*,/)  and  over  runs  at  It  follows  that  (r,0)  satisfles 

the  formula  x  ^  LD  C>{Pr[Oreject]  >  1  —  2“l*l),  as  desired. 

Now  let  (r,0)  be  any  initial  point  oiV  xV,  and  let  (r*,0)  be  the  corre¬ 
sponding  initial  point  of  P  X  V*.  Since  the  soundness  of  M  guarantees  that 
the  initial  point  (r*,0)  must  satisfy  the  formula  Pr[nsound(M,®  6  X))]  > 
1  —  a  |x|~*,  the  preceding  argument  shows  that  the  initial  point  (r,0)  must 
satisfy  the  formula 

Pr[x  ^LD  0{Pr[Oreject]  >  1  -  2-1*1)]  >  1  -  a  |®i“* , 

It  follows  that  (r,  0)  satisfies 

Pr[x  ^LD  Oreject]  >  (1  -  2-l*l)(l  -  a  |x|~*), 


which  implies 

Pr[Oacccpt  D  ®  €  X]  >  (1  —  2"1*1)(1  ~  a  1®]'*). 

Since 

(l-2-'*l)(l~alxr*)  >  l-aN~*-2-!*' 

9~!*! 
a\x\  * 

>  1  —  a/3  |xj~* 

for  some  /3  >  1,  it  follows  that  (r,  0)  satisfles 

Pr[Oaccepf  D  ®  €  X]  >  1  —  7 

for  some  7  >  1.  Thus,  (P,  V)  satisfies  the  soimdness  condition. 

We  now  prove  that  (P,  V)  satisfies  the  completeness  condition 

P  xV  \=  init  D  Pr[x  E  LD  <>accept]  >l  —  a  \x\~^ 

of  Proposition  5.1.  Since  M  is  complete  for  x  €  X  given  r/)  mV  x  F’*,  we 
have 

P  X  F*  1=  init  D  Pr[D(V’  D  complete{M,x  6  X))]  >  1  —  a  |®|“* . 
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As  above,  taking  pr[M  accepts]  >  1  —  2“I*1  as  a  primitive  proposition  in 
our  language,  the  condition  complete{MfX  6  L)  is  equivalent  to  the  formula 
X  G  Ld  pr[M  accepts]- >  1  —  2“l*l.  It  follows  that 

□(V'  D  complete{M,x  6  L)) 


implies 

□(V'  D  (x  €  L  D  pr[M  accepts]  >  1  —  2~'*l)) 

which  implies 

X  €  i  D  □(V'  D  pr{M  accepts]  >  1  —  2~l*l) 
since  x  G  L  is  a  fact  about  the  run. 

We  claim  that,  given  corresponding  runs  r  and  r*  of  P  X  and  P  x  V", 
if  the  initial  point  (r*,0)  satisfies  □(■0  D  complete{M^x  G  L))  and  hence 
satisfies 

X  G  LD  □(V'  D  pr[M  accepts]  >  1  -  2“l*l), 
then  the  initial  point  (r,  0)  satisfies 

xGLD  0{Pr[Oaccept]  >  1  -  2-l*l). 

To  see  this,  let  I  be  the  time  at  which  the  verifier  has  finished  the  protocol 
V*  in  r  and  r*  and  starts  the  test  M  in  r.  If  (t'*,0)  1=  x  €  i,  then  (r*,/)  [= 
pr[M  accepts]  >  1  -  2~l*l  since  |=  Consequently,  if  (r,0)  \=  x  G  L, 
then  (r,/)  |=  Pr[Oaccept]  >  1  —  2~W,  and  hence  (r,0)  satisfies  the  formula 
X  G  LD  0(Pr[Oaccept]  >  1  —  2“l*l). 

Now  let  (r,0)  be  any  initial  point  of  P  xV,  and  let  (r*,0)  be  the  cor¬ 
responding  initial  point  of  P  x  V*.  Since  the  completeness  of  M  given  ^ 
guarantees  that  the  initial  point  (r*,0)  must  satisfy  the  formula 

Pr[0{ip  D  complete(M,x  G  i/))]  >  1  -  a  |x|“* , 

the  preceding  argument  shows  that  the  initial  point  (r,  0)  must  satisfy 

Pr[x  G  L  D  0(Pr[Oaccept]  >  1  —  2“l*')]  >  1  —  a  |x|“* . 

It  follows  that  (r,  0)  satisfies 

Pr[x  G  L  D  Oaccept]  >  (1  —  2“'**)(1  —  a  |x|“*). 
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and  hence 

Pr[aj  €  i  D  Oaccepi]  >  1  — 

for  some  7  >  1  eis  above.  Thus,  {P,V)  satisfies  thv,  completeness  condition. 

□ 

Theorem  5.5:  Let  (P,  V)  be  a  zero  knowledge  proof  system  for  i,  let  F* 
be  an  arbitrary  verifier,  and  let  be  a  fact  about  the  initial  state,  7or  every 
fact  ‘iff  and  constant  k>l  there  is  a  fact  if)'  and  a  constant  oc  >  1  such  that 

PxV\={xeLA  init)  D  D  Kf{x  GLD  (p)]. 

Proof:  Given  a  fact  if)  and  a  constant  k,  we  construct  a  fact  ip'  and  constant 
a  satisfying  the  formula  above. 

Notice  that  we  can  assume  K^<p  holds  at  some  point  of  P  X  F*  (the 
theorem  is  trivially  true  if  it  does  not),  and  hence  the  existence  of  a  test 
M  for  K„(p  that  is  practically  sound  and  is  practically  complete  given  ip. 
Without  loss  of  generality  we  can  assume  two  things  about  this  test.  First, 
we  can  assume  that  M  accepts  with  probabilities  2“l*l  or  1  —  2“l“l  instead  of 
1/3  or  2/3.  Second,  since  we  assume  that  the  verifier’s  local  state  encodes 
the  verifier’s  local  history,  and  since  v?  is  a  fact  about  the  initial  state,  if  K^fp 
holds  at  any  point  of  a  proof  then  it  holds  at  the  end  of  the  proof  as  well. 
Consequently,  since  the  verifier’s  local  state  does  encode  the  verifier’s  history, 
we  can  assume  that  M  accepts  with  probability  2/3  at  the  end  of  a  proof  if  it 
does  so  at  any  point  in  the  middle  of  the  proof.  Neither  assumption  aifects  the 
fact  that  M  is  practically  sound  for  K„(p,  and  practically  complete  for  K^ip 
given  ip.  Given  the  constant  k  fixed  above,  let  03*  be  the  constant  guaranteed 
for  3^  by  the  definition  of  the  practical  soundness  and  completeness  of  M. 

We  can  also  assiime  the  existence  of  a  Turing  machine  Mv{t,  x)  that  ap¬ 
proximates  the  distribution  of  local  histories  generated  by  (P(s),  F*(<))(a!). 
In  particular,  the  following  modification  Mh  of  the  test  M  is  able  to  dis¬ 
tinguish  these  distributions  with  only  negligible  probability.  Notice  that  the 
input  to  M  is  the  verifier’s  local  state.  We  can  modify  M  to  obtain  a  test  Mh 
ih.^t  accept';  as  input  the  verifier’s  local  history  and  runs  the  test  M  at  the 
final  loczil  scate  in  the  local  history,  accepting  iff  the  test  M  accepts.  Since 
the  length  of  the  interactive  proof  is  bounded  by  some  polynomial  in  |x|,  we 
can  guarantee  that  Mh  still  runs  in  time  polynomial  in  jzj  on  arbitrary  inputs 
by  having  it  reject  outright  when  presented  with  a  history  that  is  too  long. 

Consider  now  the  test  T'  defined  as  follows: 
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T'{t,  a;):  accepted  :=  /alsc 
repeat  6  |a5|*  times 

ruu  MvitfX)  to  generate  a  local  history  H 
if  Mh  accepts  H  then 
accepted  :=  true 
end  repeat; 

if  accepted  then  accept  else  reject. 

In  a  few  moments  we  will  prove  that  T'  is  a  test  for  K^ix  £  Ld  <p)  that,  for 
some  constant  X,  is  sound  at  all  points  with  |xj  >  X,  and  is  complete  at  all 
points  with  |x|  >  X  that  satisfy 

V**  ^  inti  A  X  6  i  A  Pr[Ok^<p]  >  |x|“* . 

In  fact,  we  will  show  that  T'  accepts  with  probability  2/3  at  all  points  with 
|x|  >  X  that  satisfy  Taking  ij)*  to  be  the  fact  holding  at  points  satisfying 
|xl  >  X  and  V’i)  and  taJung  T  to  be  the  test  obtained  by  rdodifying  T'  to 
reject  outright  if  jx]  <  X,  it  will  follow  that  T  is  a  test  for  Kv{x  E  L  D  ip) 
that  is  sound  and  is  complete  given  In  fact,  T  will  accept  with  probability 
2/3  at  all  points  satisfying  tl)'. 

Given  such  a  test  T,  the  rest  of  the  proof  is  completed  as  follows.  Take 
a  —  X*  so  that  1  —  a  |xl“*  <  0  when  jx]  <  X.  Consider  ah  initial  point  c 
satisfying  x  E  L.  If  c  satisfies  Pr(OXif^|  <  1*1"*)  c  trivially  satisfies 
Pr[<>k^p  D  kf{x  E  LD  (f)]  >  1  —  |x|“  .  If  c  satisfies  jxj  <  X,  then  1  — 
a!x|“*  <  0,  and  c  trivially  satisfies  Fr[OX^V?  ^  Kf{x  E  LD  <p)]  >  1  — 
a  jx]”*'.  So  suppose  c  satisfies  Pr[Ok^(p]  >  |x|"*  and  |x|  >  X.  Notice  that 
c  satisfies  if}\  and  hence  that  T  accepts  with  probability  2/3  at  c.  Since  T  is 
sound  for  K„{x  E  LD  (p),  it  follows  that  c  satisfies  Xv(x  E  L  D<p)t  and  hence 
that  c  satisfies  D  kf{x  E  LD  y>)]  =  1.  Consequently,  all  initial 

points  c  with  x  E  L  satisfy  Pr[Okf(p  D  k^'{x  E  L  D  ^)]  >  1  —  a  |®r*»  and 
hence  satisfy  3  kf{x  E  LD  p)]aB  desired. 

It  remains  only  to  prove  that,  for  some  constant  X,  the  test  T'  is  sound 
at  points  with  |x|  >  X  and  is  complete  at  points  with  |x|  >  X  satisfying  V'c* 

We  first  prove  that  T'  is  sound  at  all  points  with  sufficiently  large  x:  {pven 
a  point  c  o{  P  xV*  satisfying  -’X»(x  E  LD  (p)  with  sufficiently  large  x,  we 
prove  that  T'  rejects  with  probability  2/3  at  c. 

Since  c  satisfies  ^X»(x  E  L  D  p),  some  point  d  of  V  xV  with  c  o' 
satisfies  -<(x  E  LD  p).  Since  T'  takes  as  input  only  x  and  t  found  in  the  v’s 
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local  state,  which  is  the  same  at  both  c  and  o',  the  test  T'  must  reject  Mrith 
the  same  probability  at  both  points.  Without  loss  of  generality,  therefore, 
we  can  assume  c  satisfies  -i(a:  6  LD(p)y  or  equivalently  that  c  satisfies  x  E  L 
but  not  (p. 

T'  rejects  at  c  iff,  on  each  iteration,  Mh  rejects  a  history  generated  by 
My-  What  is  the  probability  that  Mh  rejects  a  history  generated  by  Mv*? 
Suppose  the  point  c  fixed  above  is  the  initial  point  of  a  run  of  {P{s),  F*(t))(® ). 
Since  (P,  V)  is  a  zero-knowledge  proof  system,  we  know  that,  for  sufficiently 
large  i,  the  probability  Mh  rejects  a  history  generated  by  Myit,  x)  is  within 
of  the  probability  Mh  rejects  a  history  generated  by  (P(s),  F*(t))(x). 
But  this  latter  probability  is  just  the  probability  the  original  test  M  rejects 
at  the  end  of  a  run  of  (P(s),  V’*(t))(®).  Since  c  satisfies  -i^,  and  since  y? 
is  a  fact  about  the  initial  state,  we  know  that  -up,  and  hence  -'K„(p,  holds 
at  all  points  of  every  run  of  (P(s),  V*(t))(x).  Since  M  is  practically  soimd 
for  ifuV’j  we  know  that,  for  sufficiently  large  x,  the  test  M  rejects  with 
probability  at  least  1  —  2“l*l  at  the  end  of  at  least  1  —  ash  >  1  — 
of  the  runs  of  (P(s),  V*(t))(x).  Consequently,  the  probability  Mh  rejects  a 
history  generated  by  Mv(t,x),  and  hence  the  probability  a  given  iteration 
of  T'  rejects  at  c,  is  at  least 

(l-2-l*l)(l-|xr*)-|xr"*  >  1-2|®|-** -2-1*1 

>  i-3|xr** 

for  sufficiently  large  ®;  and  hence  the  probability  T'  rejects  at  c  (that  is,  that 
all  6 1®|*  iterations  of  T'  reject)  is  at  least  (1—3 1®!”**)®****,  which  goes  to  1 
as  |x|  goes  to  infinity.  It  follows  that  T'  rejects  with  probability  2/3  at  c  for 
sufficiently  large  x. 

We  now  prove  that  T'  is  complete  at  all  points  satisfying  V’i  with  suffi¬ 
ciently  large  x:  given  a  point  cofPxV*  satisfying  V’*  with  sufficiently  large 
x,  we  prove  that  T'  accepts  with  probability  2/3  at  c. 

First  consider  the  probability  a  given  iteration  of  T'  accepts  at  c.  Suppose 
the  given  point  c  is  an  initial  point  of  a  run  of  (P(s),  V'*(t))(®).  Since  (P,  V) 
is  a  zero-knowledge  proof  system,  we  know  that,  for  sufficiently  large  x,  the 
probability  Mh  accepts  a  history  generated  by  Mv.(t,»)  is  within  |®|-**  of 
the  probability  Mh  accepts  a  history  generated  by  (P(s),  F*(<))(®),  which  is 


216 


CHAPTER  5.  KNOWLEDGE  AND  ZERO  KNOWLEDGE 


precisely  the  probability  the  original  test  M  accepts  at  the  end  of  a  run  of 
(P(s),  y*(t))(®).  Since  c  satisfies  c  satisfies  Pr[Ok*(p]  >  1®!"*.  This 
mesuis  that  at  least  |x|~^  of  the  runs  of  (P(<s),  V'*(t))(®)  pass  through  a  point 
satisfying  i)  and  K„(p,  and  that  M  accepts  with  probability  at  least  2/3  at 
such  points  in  at  least  1  —  a3fc|x|“®*  >  1  —  1*1"*  of  these  runs.  Since  we 
assume  M  accepts  with  probability  2/3  at  the  end  of  a  run  if  it  does  so  in 
the  middle  of  a  run,  the  same  is  true  at  the  end  of  these  runs.  This  means 
one  iteration  of  T'  accepts  with  probability  at  least 

>  iN-‘. 

It  follows  that  a  given  iteration  of  T’  rejects  with  probability  at  most  1  — 
|xl“*  /3,  that  all  iterations  of  T'  reject  (in  which  case  T'  itself  rejects)  with 
probability  at  most  (1  -  |x|“*  /3)®l*l^,  and  hence  that  T'  accepts  with  prob¬ 
ability  at  least 


«l_(e->/3).>| 


for  sufficiently  large  x.  (Here  we  are  using  the  fact  that  (1  +  c/n)’‘  tends  to 
e®  as  n  tends  to  infinity.)  It  follows  that  T'  accepts  with  probability  2/3  at 
c  satisfying  V'i  with  sufficiently  large  x.  □ 


Lemma  5.8:  There  is  a  weak  interactive  proof  system  for  L  iff  L  is  in  BPP. 

Proof;  Suppose  (P,  V)  is  a  weak  interactive  proof  for  L.  Consider  the 
Turing  machine  M  that  on  input  x  simulates  (P,  V’)(x)  with  empty  work 
tapes.  Notice  that  since  both  P  and  V  run  in  polynomial  time,  so  does  the 
Turing  machine  M.  By  the  definition  of  a  (weak)  interactive  proof  system, 
iix  E  L  and  x  is  sufficiently  large,  then  (P,y)(®)  and  hence  M{x)  accepts 
with  probability  2/3;  and  ifx  ^  L  and  x  is  sufficiently  large,  then  (P,  V)(®) 
and  hence  M{x)  rejects  with  probability  2/3.  Since  we  can  hardwire  into  M 
whether  M  should  accept  or  reject  x  for  the  finite  number  of  insufficiently 
large  x’s,  we  can  assume  M  is  a  BPP  Turing  machine,  and  hence  that  L  is 
in  BPP. 
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Conversely,  suppose  L  is  in  BPP.  Let  M  be  a  BPP  Turing  machine  for  L, 
and  let  (P,  F)  be  the  interactive  protocol  defined  2is  follows:  on  input  x,  the 
prover’s  protocol  P  does  nothing,  and  the  verifier’s  protocol  V  runs  M{x)  and 
accepts  iff  M{x)  accepts.  Since  the  verifier  ignores  both  the  prover  and  the 
work  tapes,  it  is  clear  that  for  any  P*,  s,  and  t,  if  a  £  L,  then  (P*(«),  V’(t))(2) 
accepts  with  probability  2/3;  and  if  x  ^  L,  then  (P*(s),  V’(t))(x)  rejects  with 
probability  2/3.  It  follows  that  (P,  F)  is  a  weak  interactive  proof  system  for 
L.  □ 


Lemma  5.9:  A  weak  interactive  protocol  (P,  V)  is  a  weak  interactive  proof 
system  for  a  fact  (p  about  the  prover’s  work  tape  and  the  common  input  iff 

1.  for  all  sufficiently  large  x  and  for  edl  s,  we  have  (p{x,  s)  iff  x  £  doTn{<p)] 
and 

2.  dom{<p)  is  in  BPP. 


Proof:  Suppose  (P,  V)  is  a  weak  interactive  proof  system  for  a  fact  (p  about 
the  prover’s  work  tape  and  the  common  input.  Fix  k  and  let  Nk  be  the  con¬ 
stant  given  by  the  soundness  and  complete  conditions  for  a  weak  interactive 
proof  system. 

To  prove  part  1,  suppose  for  some  x  with  |x|  >  Nk  we  have  ^(®,  s)  and 
-■^(x,  s'),  and  consider  the  prover  P,  that  ignores  its  work  tape  and  simulates 
the  protocol  P  on  work  tape  s.  Since  9?(x,  s),  we  know  the  verifier  must  accept 
in  (P(s),  F(t))(x)  with  probability  at  least  1  —  |x|“*.  Since  ->ip{x,a'),  we 
know  the  verifier  must  accept  in  (P,(5'),  F(t))(x)  with  probability  at  most 
|x|~*.  Notice,  however,  that  the  prover  P,  on  work  tape  s'  simulates  the 
prover  P  on  work  tape  s,  and  hence  the  two  distributions  (P(5))  V’(t))(x) 
and  (P*(s'),  F(t))(x)  are  identical.  Consequently,  the  verifier  must  accept 
with  the  same  probability  in  both  (P(s),  F(<))(x)  and  (P,(s'),  F(<))(x),  a 
contradiction.  It  follows  that,  for  all  sufficiently  large  x  with  |x|  >  N*,  we 
have  X  £  dom{(p)  iff  y)(x,  s')  for  some  s'  iff  tp{x,  s)  for  all  s. 

To  prove  part  2,  let  M  be  the  Turing  machine  that  on  input  x  simulates 
(P,  F)(x)  with  empty  work  tapes.  Since  P  and  V  run  in  polynomial  time, 
so  does  M.  By  part  1  and  the  definition  of  a  weak  interactive  proof  system 


(P,  F),  if  {xj  >  Nk  and  x  £  dom((p),  then  <p(x,e)  is  satisfied  (where  e  is  the 


empty  string),  so  (P,  F)(x)  and  hence  M(x)  accepts  with  probability  2/3; 
and  if  |x|  >  Nk  and  x  ^  dom{(p),  then  tp{x,  e)  is  not  satisfied,  so  (P,  F)(x) 
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and  hence  M{x)  rejects  with  probability  2/3.  Since  we  can  hardwire  into  M 
whether  M  should  accept  or  reject  x  for  the  finite  number  of  insufficiently 
l2irge  x’s,  we  can  assume  M  is  a  BPP  Turing  machine,  and  hence  that  dom{tp) 
is  in  BPP. 

Conversely,  suppose  parts  1  and  2  are  satisfied.  Since  dom{(p)  is  in  BPP, 
we  know  by  Lemma  5.8  that  there  is  a  weak  interactive  proof  system  (P,  V) 
for  dom{(p).  By  part  1,  for  every  k  and  sufficiently  large  x,  if  yj(x,s)  then 
X  €  dom{<fi)  and  (P(s),  V(t))(x)  accepts  with  probability  at  least  1  —  [xp*; 
and  if  -'^(x,s)  then  x  ^  dom{(p)  {P*{a),V{t)){x)  accepts  with  probability 
at  most  lx|“*  for  all  provers  P*.  It  follows  that  (P,  V)  is  a  weak  interactive 
proof  system  for  R  as  well.  □ 

Proposition  5.10:  (P,  V)  is  an  interactive  proof  satisfying  the  correctness 
condition  that  the  prover  can  generate  a  y  such  that  R[x,y)  iff  (P,  V)  is  a 
weak  interactive  proof  system  for 

Proof:  Suppose  (P,  V")  is  an  interactive  proof  satisfying  the  correctness  con¬ 
dition  that  the  prover  can  generate  a  y  such  that  R{x,y).  We  prove  that 
(P,  V)  is  a  weak  interactive  proof  for  tpR.  For  completeness,  if  <Pr{P,x,s) 
holds  then  R{x,a)  holds,  and  (P(s),  V’(t))(x)  accepts  with  high  probability 
by  the  completeness  condition  for  an  interactive  proof  of  [TW87],  so  (P,  V) 
satisfies  the  completeness  condition  for  a  weak  interactive  proof  of  ipR.  For 
soundness,  suppose  -i^ij(P*,x,s)  holds.  Since  the  definition  of  an  interac¬ 
tive  proof  of  [TW87]  guarantees  that  the  soundness  condition  holds  for  all 
prover  protocols  P*,  it  is  impossible  for  the  fact  ¥>ji(P*,  x,  s)  to  be  false  when 
P*  ^  P.  The  only  way  for  pr{P*^x,s)  to  be  false  is  when  P*  =  P,  in  which 
case  the  only  way  for  (Pr{P*,  x,  a)  to  be  false  is  if  R{x,  a)  is  false.  In  this 
case,  the  correctness  condition  guarantees  (P(a),  V’(t))(x)  accepts  with  low 
probability,  and  hence  {P,V)  satisfies  the  soundness  condition  for  a  weak 
interactive  proof  of  tpR. 

Conversely,  suppose  (P,  V)  is  a  weak  interactive  proof  for  <pR.  We  prove 
that  (P,  y )  is  an  interactive  proof  satisfying  the  correctness  condition  that 
the  prover  can  generate  a  y  such  that  P(x,  y).  The  correctness  condition  is 
clearly  satisfied,  since  -'P(x,  a)  implies  -'(pR{Py  x,  a),  in  which  case  the  sound¬ 
ness  condition  for  a  weak  interactive  proof  guarantees  y(i))(®)  accepts 
with  low  probability.  The  completeness  condition  is  also  clearly  satisfied, 
since  P(x,  a)  implies  (pr{P,  x,  a),  in  which  case  the  completeness  condition 
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for  a  weak  interactive  proof  guarantees  (P(a),  V{t)){x)  accepts  with  high 
probability.  The  definition  of  (pR  shows  the  soundness  condition  is  satisfied 
for  prover  protocols  P*  ^  P,  so  consider  the  protocol  P.  Since  the  complete¬ 
ness  condition  guaxantees  that  (P(3),  F(i))(x)  accepts  with  low  probability 
when  c(x,  s)  holds,  the  trivial  generator  Mp  that  simply  returns  s  shows 
that  the  soundness  condition  is  satisfied  for  the  prover  protocol  P  as  well. 
Thus,  (P,  V)  is  an  interactive  proof  the  prover  can  generate  a  y  such  that 
R{x,y).  □ 
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CHAPTER  5.  KNOWLEDGE  AND  ZERO  KNOWLEDGE 


Chapter  6 
Conclusion 


Since  the  work  of  Halpern  and  Moses.  [HM84],  a  number  of  papers  have 
analyzed  problems  in  distributed  computation  in  terms  of  knowledge.  Our 
goal  has  been  to  apply  knowledge  to  new  problems,  and  to  expand  the  domain 
of  problems  to  which  knowledge  can  be  applied. 

The  work  in  Chapter  3  shows  how  powerful  reasoning  about  knowledge 
can  be.  Using  the  close  relationship  between  common  knowledge  and  si¬ 
multaneity,  we  have  obtained  general,  unifying  results  about  computation  in 
unreliable  systems.  We  have  identified  a  general  class  of  problems,  including 
the  well-known  consensus  and  distributed  firing  squad  problems,  and  shown 
how  to  transform  the  specification  of  such  problems  into  protocols  that  are 
optimal  in  a  very  strong  sense.  The  state  of  common  knowledge  has  played 
a  central  role  in  the  derivation  of  these  protocols.  In  the  process  of  imple¬ 
menting  tests  for  common  knowledge  we  have  exposed  a  number  of  subtle 
differences  between  variants  of  the  well-known  omissions  failure  model.  This 


work  has  shown  how  knowledge  can  be  used  in  both  protocol  design  and  in 
the  derivation  of  nontrivial  lower  bounds  on  computational  complexity.  It 
is  not  at  all  clear  how  the  observations  leading  to  these  results  would  have 
been  obtained  had  we  not  been  thinking  about  these  problems  in  terms  of 
knowledge. 

While  this  work  shows  that  reasoning  about  knowledge  can  be  beneficial, 
we  have  observed  that  in  some  contexts  the  standard  definition  of  knowledge 


does  not  appear  to  be  the  most  appropriate  definition.  In  the  second  li 


this  thesis,  we  have  studied  definitions  of  knowledge  for  use  in  two  of  these 


contexts. 
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In  the  context  of  probabilistic  protocols,  the  standard  definition  of  knowl¬ 
edge  does  not  enable  us  to  capture  a  notion  of  confidence  that  can  be  useful 
when  reasoning  about  such  protocols.  In  Chapter  4,  using  the  framework  de¬ 
veloped  by  Fagin  and  Halpern  [FH88],  we  have  examined  various  definitions 
of  probabilistic  knowledge  that  let  us  capture  several  different  notions  of  con¬ 
fidence.  We  have  observed  that  there  is  no  one  notion  of  confidence  that  is 
most  appropriate  in  all  contexts.  The  best  way  to  think  about  the  various 
definitions  is  in  terms  of  betting  games  and  betting  against  different  types 
of  adversaries.  We  have  shown,  for  every  given  adversary,  how  to  construct 
the  definition  of  probabilistic  knowledge  that  is  provably  the  best  definition 
in  the  context  of  that  particular  adversary.  We  have  shown  how  these  defini¬ 
tions  can  be  used  to  analyze  a  probabilistic  variant  of  the  coordinated  attack 
problem. 

Cryptography  is  another  context  in  which  the  standard  definition  of 
knowledge  does  not  capture  all  relevant  aspects  of  the  problems  at  hand. 
This  is  due  primarily  to  the  fact  that  the  standard  definition  does  not  allow 
us  to  express  the  fact  that  the  bounds  on  an  agent’s  computational  powers  af¬ 
fect  what  that  agent  can  know.  In  Chapter  5  we  have  shown  how  the  context 
of  cryptography  motivates  the  definition  of  practical  knowledge,  a  definition 
of  knowledge  incorporating  both  probability  and  limitations  on  agents’  com¬ 
putational  powers.  We  have  show  how  the  definition  of  practical  knowledge 
csm  be  used  to  characterize  interactive  proof  systems,  and  to  capture  the  in¬ 
tuition  that  a  verifier  learns  essentially  nothing  as  a  result  of  a  zero  knowledge 
proof  other  than  the  fact  the  prover  initially  sets  out  to  prove.  Finally,  we 
have  sketched  how  it  is  possible  to  reason  about  such  proof  systems  directly 
in  terms  of  knowledge,  rather  than  in  terms  of  the  operational  cryptographic 
definitions. 

While  we  feel  that  our  work  represents  significant  progress  in  the  attempt 
to  extend  the  standard  definitions  of  knowledge  into  other  contexts,  a  number 
of  problems  remain.  In  particular,  while  we  have  shown  that  our  definition 
of  practical  knowledge  can  be  useful  in  contexts  where  agents’  computational 
limitations  are  of  interest,  it  is  by  no  means  clear  that  it  is  the  most  appro¬ 
priate  definition.  In  fact,  it  is  not  even  clear  what  criteria  one  should  use 
when  judging  the  suitability  of  a  definition  in  this  context.  Further  progress 
in  this  area  is  of  great  importance. 

While  we  have  noted  at  the  end  of  each  chapter  a  number  of  open  prob¬ 
lems  that  remain  to  be  resolved,  we  note  that  there  are  two  general  areas  in 
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which  knowledge  could  possibly  play  a  larger  role  than  it  has  so  far.  First, 
notice  that  the  majority  of  the  results  in  this  thesis  have  been  in  the  context 
of  synchronous  systems.  This  is  generally  true  in  the  literature  as  a  whole. 
The  role  of  knowledge  in  the  context  of  asynchronous  systems  has  been  pri¬ 
marily  as  a  tool  for  proving  lower  bounds,  but  not  so  much  as  a  tool  for 
the  design  of  new  protocols.  This  is  somewhat  surprising,  since  one  of  the 
commonly  mentioned  motivations  for  formulating  definitions  of  knowledge  in 
the  first  place  is  to  capture  informal  statements  such  as  “since  p  has  received 
message  m  &om  9,  p  knows  the  task  started  at  q  has  terminated.”  Such 
statements  often  arise  in  the  context  of  communications  protocols,  for  exam¬ 
ple.  These  protocols  are  often  quite  complex,  and  it  would  be  interesting  to 
know  whether  a  knowledge-based  analysis  could  make  such  protocols  easier 
to  understand,  and  easier  to  construct; 

Finally,  we  note  that  it  is  becoming  increasingly  important  to  be  able  to 
reason  explicitly  about  time  when  designing  protocols.  For  example,  timeouts 
play  an  important  role  in  the  protocols  designed  for  asynchronous  systems. 
Designers  often  explain  these  protocols  as  if  the  processors  themselves  must 
explicitly  reason  about  how  their  knowledge  of  the  system  changes  as  a  re¬ 
sult  of  whether  a  given  timeout  occurs  or  not.  It  woiild  be  interesting  to 
understand  how  to  reason  about  timeouts  (and  time  in  general)  directly  in 
terms  of  formal  notions  of  knowledge.  Much  remains  to  be  done. 
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